Constructor
def __init__(self, user_name, api_key)
Constructor for the PDFCrowd API client.
- Parameters:
-
-
user_name
- Your username at PDFCrowd. -
api_key
- Your API key.
-
Conversion Format
def setOutputFormat(self, output_format)
The format of the output file.
- Parameter:
-
-
output_format
- Allowed Values:
-
-
png
-
jpg
-
gif
-
tiff
-
bmp
-
ico
-
ppm
-
pgm
-
pbm
-
pnm
-
psb
-
pct
-
ras
-
tga
-
sgi
-
sun
-
webp
-
- Default:
-
png
-
Conversion Input
def convertUrl(self, url) { return byte[]; }
Convert a web page.
- Parameter:
-
-
url
- The address of the web page to convert.- Constraint:
-
- Supported protocols are http:// and https://.
-
- Returns:
- byte[] - Byte array containing the conversion output.
def convertUrlToStream(self, url, out_stream)
Convert a web page and write the result to an output stream.
- Parameters:
-
-
url
- The address of the web page to convert.- Constraint:
-
- Supported protocols are http:// and https://.
-
out_stream
(OutputStream) - The output stream that will contain the conversion output.
-
def convertUrlToFile(self, url, file_path)
Convert a web page and write the result to a local file.
- Parameters:
-
-
url
- The address of the web page to convert.- Constraint:
-
- Supported protocols are http:// and https://.
-
file_path
- The output file path.
-
def convertFile(self, file) { return byte[]; }
Convert a local file.
- Parameter:
-
-
file
- The path to a local file to convert.
The file can be either a single file or an archive (.tar.gz, .tar.bz2, or .zip).
If the HTML document refers to local external assets (images, style sheets, javascript), zip the document together with the assets.- Constraints:
-
- The file must exist and not be empty.
- The file name must have a valid extension.
-
- Returns:
- byte[] - Byte array containing the conversion output.
def convertFileToStream(self, file, out_stream)
Convert a local file and write the result to an output stream.
- Parameters:
-
-
file
- The path to a local file to convert.
The file can be either a single file or an archive (.tar.gz, .tar.bz2, or .zip).
If the HTML document refers to local external assets (images, style sheets, javascript), zip the document together with the assets.- Constraints:
-
- The file must exist and not be empty.
- The file name must have a valid extension.
-
out_stream
(OutputStream) - The output stream that will contain the conversion output.
-
def convertFileToFile(self, file, file_path)
Convert a local file and write the result to a local file.
- Parameters:
-
-
file
- The path to a local file to convert.
The file can be either a single file or an archive (.tar.gz, .tar.bz2, or .zip).
If the HTML document refers to local external assets (images, style sheets, javascript), zip the document together with the assets.- Constraints:
-
- The file must exist and not be empty.
- The file name must have a valid extension.
-
file_path
- The output file path.
-
def convertString(self, text) { return byte[]; }
Convert a string.
- Parameter:
-
-
text
- The string content to convert.
-
- Returns:
- byte[] - Byte array containing the conversion output.
def convertStringToStream(self, text, out_stream)
Convert a string and write the output to an output stream.
- Parameters:
-
-
text
- The string content to convert. -
out_stream
(OutputStream) - The output stream that will contain the conversion output.
-
def convertStringToFile(self, text, file_path)
Convert a string and write the output to a file.
- Parameters:
-
-
text
- The string content to convert. -
file_path
- The output file path.
-
def convertStream(self, in_stream) { return byte[]; }
Convert the contents of an input stream.
- Parameter:
-
-
in_stream
(InputStream) - The input stream with source data.
The stream can contain either HTML code or an archive (.zip, .tar.gz, .tar.bz2).
The archive can contain HTML code and its external assets (images, style sheets, javascript).
-
- Returns:
- byte[] - Byte array containing the conversion output.
def convertStreamToStream(self, in_stream, out_stream)
Convert the contents of an input stream and write the result to an output stream.
- Parameters:
-
-
in_stream
(InputStream) - The input stream with source data.
The stream can contain either HTML code or an archive (.zip, .tar.gz, .tar.bz2).
The archive can contain HTML code and its external assets (images, style sheets, javascript). -
out_stream
(OutputStream) - The output stream that will contain the conversion output.
-
def convertStreamToFile(self, in_stream, file_path)
Convert the contents of an input stream and write the result to a local file.
- Parameters:
-
-
in_stream
(InputStream) - The input stream with source data.
The stream can contain either HTML code or an archive (.zip, .tar.gz, .tar.bz2).
The archive can contain HTML code and its external assets (images, style sheets, javascript). -
file_path
- The output file path.
-
def setZipMainFilename(self, filename)
Set the file name of the main HTML document stored in the input archive. If not specified, the first HTML file in the archive is used for conversion. Use this method if the input archive contains multiple HTML documents.
- Parameter:
-
-
filename
- The file name.
-
Image Output
def setScreenshotWidth(self, width)
Set the output image width in pixels.
- Parameter:
-
-
width
(int)- Constraint:
-
- The accepted range is 96-65000.
- Default:
-
1024
-
- Example:
-
-
Full HD width:
setScreenshotWidth(1920)
-
Full HD width:
def setScreenshotHeight(self, height)
Set the output image height in pixels. If it is not specified, actual document height is used.
- Parameter:
-
-
height
(int)- Constraint:
-
- Must be a positive integer.
-
- Example:
-
-
Full HD height:
setScreenshotHeight(1080)
-
Full HD height:
def setScaleFactor(self, factor)
Set the scaling factor (zoom) for the output image.
- Parameter:
-
-
factor
(int) - The percentage value.- Constraint:
-
- Must be a positive integer.
- Default:
-
100
-
- Example:
-
-
Reduce image for thumbnails:
setScaleFactor(50)
-
Reduce image for thumbnails:
def setBackgroundColor(self, color)
The output image background color.
- Availability:
- API client >= 5.0.0, converter >= 20.10. See versioning.
- Parameter:
-
-
color
- Constraint:
-
- The value must be in RRGGBB or RRGGBBAA hexadecimal format.
-
- Examples:
-
-
red color:
setBackgroundColor("FF0000")
-
fully transparent background:
setBackgroundColor("00000000")
-
green color with 50% opacity:
setBackgroundColor("00ff0080")
-
green color:
setBackgroundColor("00ff00")
-
red color:
General Options
def setUsePrintMedia(self, value)
Use the print version of the page if available (@media print).
- Parameter:
-
-
value
(bool) - Set toTrue
to use the print version of the page.- Default:
-
False
-
def setNoBackground(self, value)
Do not print the background graphics.
- Parameter:
-
-
value
(bool) - Set toTrue
to disable the background graphics.- Default:
-
False
-
def setDisableJavascript(self, value)
Do not execute JavaScript.
- Parameter:
-
-
value
(bool) - Set toTrue
to disable JavaScript in web pages.- Default:
-
False
-
def setDisableImageLoading(self, value)
Do not load images.
- Parameter:
-
-
value
(bool) - Set toTrue
to disable loading of images.- Default:
-
False
-
def setDisableRemoteFonts(self, value)
Disable loading fonts from remote sources.
- Parameter:
-
-
value
(bool) - Set toTrue
disable loading remote fonts.- Default:
-
False
-
def setUseMobileUserAgent(self, value)
Use a mobile user agent.
- Availability:
- API client >= 5.3.0, converter >= 20.10. See versioning.
- Parameter:
-
-
value
(bool) - Set toTrue
to use a mobile user agent.- Default:
-
False
-
def setLoadIframes(self, iframes)
Specifies how iframes are handled.
- Availability:
- API client >= 5.0.0, converter >= 20.10. See versioning.
- Parameter:
-
-
iframes
- Allowed Values:
-
-
all
— All iframes are loaded. -
same-origin
— Only iframes with the same origin as the main page are loaded. -
none
— Iframe loading is disabled.
-
- Default:
-
all
-
def setBlockAds(self, value)
Try to block ads. Enabling this option can produce smaller output and speed up the conversion.
- Parameter:
-
-
value
(bool) - Set toTrue
to block ads in web pages.- Default:
-
False
-
def setDefaultEncoding(self, encoding)
Set the default HTML content text encoding.
- Parameter:
-
-
encoding
- The text encoding of the HTML content.- Default:
-
auto detect
-
- Examples:
-
-
Set to use Latin-2 encoding:
setDefaultEncoding("iso8859-2")
-
Set to use UTF-8 encoding:
setDefaultEncoding("utf-8")
-
Set to use Latin-2 encoding:
def setLocale(self, locale)
Set the locale for the conversion. This may affect the output format of dates, times and numbers.
- Availability:
- API client >= 5.0.0, converter >= 20.10. See versioning.
- Parameter:
-
-
locale
- The locale code according to ISO 639.- Default:
-
en-US
-
- Example:
-
-
Set to use Japanese locale:
setLocale("ja-JP")
-
Set to use Japanese locale:
def setHttpAuth(self, user_name, password)
Set credentials to access HTTP base authentication protected websites.
- Parameters:
-
-
user_name
- Set the HTTP authentication user name. -
password
- Set the HTTP authentication password.
-
def setVerifySslCertificates(self, value)
Do not allow insecure HTTPS connections.
- Parameter:
-
-
value
(bool) - Set toTrue
to enable SSL certificate verification.- Default:
-
False
-
def setFailOnMainUrlError(self, fail_on_error)
Abort the conversion if the main URL HTTP status code is greater than or equal to 400.
- Parameter:
-
-
fail_on_error
(bool) - Set toTrue
to abort the conversion.- Default:
-
False
-
def setFailOnAnyUrlError(self, fail_on_error)
Abort the conversion if any of the sub-request HTTP status code is greater than or equal to 400 or if some sub-requests are still pending. See details in a debug log.
- Parameter:
-
-
fail_on_error
(bool) - Set toTrue
to abort the conversion.- Default:
-
False
-
def setNoXpdfcrowdHeader(self, value)
Do not send the X-Pdfcrowd HTTP header in PDFCrowd HTTP requests.
- Parameter:
-
-
value
(bool) - Set toTrue
to disable sending X-Pdfcrowd HTTP header.- Default:
-
False
-
def setCustomCss(self, css)
Apply custom CSS to the input HTML document. It allows you to modify the visual appearance and layout of your HTML content dynamically. Tip: Using !important
in custom CSS provides a way to prioritize and override conflicting styles.
- Availability:
- API client >= 5.14.0, converter >= 20.10. See versioning.
- Parameter:
-
-
css
- A string containing valid CSS.
-
- Examples:
-
-
Set the page background color to gray:
setCustomCss("body { background-color: gray; }")
-
Do not show
nav
HTML elements and the element withad-block
ID in the output PDF:setCustomCss("nav, #ad-block { display: none !important; }")
-
Set the page background color to gray:
def setCustomJavascript(self, javascript)
Run a custom JavaScript after the document is loaded and ready to print. The script is intended for post-load DOM manipulation (add/remove elements, update CSS, ...). In addition to the standard browser APIs, the custom JavaScript code can use helper functions from our JavaScript library.
- Parameter:
-
-
javascript
- A string containing a JavaScript code.
-
- Example:
-
-
Set the page background color to gray:
setCustomJavascript("document.body.style.setProperty('background-color', 'gray', 'important')")
-
Set the page background color to gray:
def setOnLoadJavascript(self, javascript)
Run a custom JavaScript right after the document is loaded. The script is intended for early DOM manipulation (add/remove elements, update CSS, ...). In addition to the standard browser APIs, the custom JavaScript code can use helper functions from our JavaScript library.
- Parameter:
-
-
javascript
- A string containing a JavaScript code.
-
- Example:
-
-
Set the page background color to gray:
setOnLoadJavascript("document.body.style.setProperty('background-color', 'gray', 'important')")
-
Set the page background color to gray:
def setCustomHttpHeader(self, header)
Set a custom HTTP header to be included in all requests made by the converter.
- Parameter:
-
-
header
- Constraint:
-
- A string containing the header name and value separated by a colon.
-
- Example:
-
-
API client tracking header:
setCustomHttpHeader("X-My-Client-ID:k2017-12345")
-
API client tracking header:
def setJavascriptDelay(self, delay)
Wait the specified number of milliseconds to finish all JavaScript after the document is loaded. Your license defines the maximum wait time by "Max Delay" parameter.
- Parameter:
-
-
delay
(int) - The number of milliseconds to wait.- Constraint:
-
- Must be a positive integer or 0.
- Default:
-
200
-
- Example:
-
-
Wait for 2 seconds:
setJavascriptDelay(2000)
-
Wait for 2 seconds:
def setElementToConvert(self, selectors)
Convert only the specified element from the main document and its children. The element is specified by one or more CSS selectors. If the element is not found, the conversion fails. If multiple elements are found, the first one is used.
- Parameter:
-
-
selectors
- One or more CSS selectors separated by commas.
-
- Examples:
-
-
The first element with the id
main-content
is converted:setElementToConvert("#main-content")
-
The first element with the class name
main-content
is converted:setElementToConvert(".main-content")
-
The first element with the tag name
table
is converted:setElementToConvert("table")
-
The first element with the tag name
table
or with the idmain-content
is converted:setElementToConvert("table, #main-content")
-
The first element
<p class="article">
within<div class="user-panel main">
is converted:setElementToConvert("div.user-panel.main p.article")
-
The first element with the id
def setElementToConvertMode(self, mode)
Specify the DOM handling when only a part of the document is converted. This can affect the CSS rules used.
- Parameter:
-
-
mode
- Allowed Values:
-
-
cut-out
— The element and its children are cut out of the document. -
remove-siblings
— All element's siblings are removed. -
hide-siblings
— All element's siblings are hidden.
-
- Default:
-
cut-out
-
def setWaitForElement(self, selectors)
Wait for the specified element in a source document. The element is specified by one or more CSS selectors. The element is searched for in the main document and all iframes. If the element is not found, the conversion fails. Your license defines the maximum wait time by "Max Delay" parameter.
- Parameter:
-
-
selectors
- One or more CSS selectors separated by commas.
-
- Examples:
-
-
Wait until an element with the id
main-content
is found:setWaitForElement("#main-content")
-
Wait until an element with the class name
main-content
is found:setWaitForElement(".main-content")
-
Wait until an element with the tag name
table
is found:setWaitForElement("table")
-
Wait until an element with the tag name
table
or with the idmain-content
is found:setWaitForElement("table, #main-content")
-
Wait until
<p class="article">
is found within<div class="user-panel main">
:setWaitForElement("div.user-panel.main p.article")
-
Wait until an element with the id
def setAutoDetectElementToConvert(self, value)
The main HTML element for conversion is detected automatically.
- Availability:
- API client >= 5.5.0, converter >= 20.10. See versioning.
- Parameter:
-
-
value
(bool) - Set toTrue
to detect the main element.- Default:
-
False
-
def setReadabilityEnhancements(self, enhancements)
The input HTML is automatically enhanced to improve the readability.
- Availability:
- API client >= 5.5.0, converter >= 20.10. See versioning.
- Parameter:
-
-
enhancements
- Allowed Values:
-
-
none
— No enhancements are used. -
readability-v1
— Version 1 of the enhancements is used. -
readability-v2
— Version 2 of the enhancements is used. -
readability-v3
— Version 3 of the enhancements is used. -
readability-v4
— Version 4 of the enhancements is used.
-
- Default:
-
none
-
Data
Methods related to HTML template rendering.
def setDataString(self, data_string)
Set the input data for template rendering. The data format can be JSON, XML, YAML or CSV.
- Parameter:
-
-
data_string
- The input data string.
-
- Example:
-
-
Template variables for mail merge:
setDataString("{"recipient": "Anna May", "sender": "John Doe"}")
-
Template variables for mail merge:
def setDataFile(self, data_file)
Load the input data for template rendering from the specified file. The data format can be JSON, XML, YAML or CSV.
- Parameter:
-
-
data_file
- The file path to a local file containing the input data.
-
- Example:
-
-
External data for template rendering:
setDataFile("/home/user/john/data.json")
-
External data for template rendering:
def setDataFormat(self, data_format)
Specify the input data format.
- Parameter:
-
-
data_format
- The data format.- Allowed Values:
-
-
auto
— the data format is auto detected -
json
-
xml
-
yaml
-
csv
-
- Default:
-
auto
-
def setDataEncoding(self, encoding)
Set the encoding of the data file set by setDataFile.
- Parameter:
-
-
encoding
- The data file encoding.- Default:
-
utf-8
-
- Example:
-
-
Set to use Latin-2 encoding:
setDataEncoding("iso8859-2")
-
Set to use Latin-2 encoding:
def setDataIgnoreUndefined(self, value)
Ignore undefined variables in the HTML template. The default mode is strict so any undefined variable causes the conversion to fail. You can use {% if variable is defined %} to check if the variable is defined.
- Parameter:
-
-
value
(bool) - Set toTrue
to ignore undefined variables.- Default:
-
False
-
def setDataAutoEscape(self, value)
Auto escape HTML symbols in the input data before placing them into the output.
- Parameter:
-
-
value
(bool) - Set toTrue
to turn auto escaping on.- Default:
-
False
-
def setDataTrimBlocks(self, value)
Auto trim whitespace around each template command block.
- Parameter:
-
-
value
(bool) - Set toTrue
to turn auto trimming on.- Default:
-
False
-
def setDataOptions(self, options)
csv_delimiter
- The CSV data delimiter, the default is,
.xml_remove_root
- Remove the root XML element from the input data.data_root
- The name of the root element inserted into the input data without a root node (e.g. CSV), the default isdata
.
- Parameter:
-
-
options
- Comma separated list of options.
-
- Examples:
-
-
Use semicolon to separate CSV data:
setDataOptions("csv_delimiter=;")
-
Name the root of data
rows
and use the name in the template loop {% for row in rows %}...{% endfor %}:setDataOptions("data_root=rows")
-
Remove XML root so it the HTML template can be more simple:
setDataOptions("xml_remove_root=1")
-
Use semicolon to separate CSV data:
Miscellaneous
def setDebugLog(self, value)
Turn on the debug logging. Details about the conversion are stored in the debug log. The URL of the log can be obtained from the getDebugLogUrl method or available in conversion statistics.
- Parameter:
-
-
value
(bool) - Set toTrue
to enable the debug logging.- Default:
-
False
-
def getDebugLogUrl(self) { return string; }
Get the URL of the debug log for the last conversion.
- Returns:
- string - The link to the debug log.
def getRemainingCreditCount(self) { return int; }
Get the number of conversion credits available in your account.
This method can only be called after a call to one of the convertXtoY methods.
The returned value can differ from the actual count if you run parallel conversions.
The special value 999999
is returned if the information is not available.
- Returns:
- int - The number of credits.
def getConsumedCreditCount(self) { return int; }
Get the number of credits consumed by the last conversion.
- Returns:
- int - The number of credits.
def getJobId(self) { return string; }
Get the job id.
- Returns:
- string - The unique job identifier.
def getOutputSize(self) { return int; }
Get the size of the output in bytes.
- Returns:
- int - The count of bytes.
def getVersion(self) { return string; }
Get the version details.
- Returns:
- string - API version, converter version, and client version.
def setTag(self, tag)
Tag the conversion with a custom value. The tag is used in conversion statistics. A value longer than 32 characters is cut off.
- Parameter:
-
-
tag
- A string with the custom tag.
-
- Example:
-
-
Track job in analytics:
setTag("client-1234")
-
Track job in analytics:
def setHttpProxy(self, proxy)
A proxy server used by the conversion process for accessing the source URLs with HTTP scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.
- Parameter:
-
-
proxy
- Constraint:
-
- The value must have format DOMAIN_OR_IP_ADDRESS:PORT.
-
- Examples:
-
-
Corporate proxy server:
setHttpProxy("myproxy.com:8080")
-
Direct IP proxy connection:
setHttpProxy("113.25.84.10:33333")
-
Corporate proxy server:
def setHttpsProxy(self, proxy)
A proxy server used by the conversion process for accessing the source URLs with HTTPS scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.
- Parameter:
-
-
proxy
- Constraint:
-
- The value must have format DOMAIN_OR_IP_ADDRESS:PORT.
-
- Examples:
-
-
Secure proxy for HTTPS:
setHttpsProxy("myproxy.com:443")
-
Direct secure proxy IP:
setHttpsProxy("113.25.84.10:44333")
-
Secure proxy for HTTPS:
def setClientCertificate(self, certificate)
A client certificate to authenticate the converter on your web server. The certificate is used for two-way SSL/TLS authentication and adds extra security.
- Parameter:
-
-
certificate
- The file must be in PKCS12 format.- Constraint:
-
- The file must exist and not be empty.
-
- Example:
-
-
Custom CA certificate path:
setClientCertificate("/home/user/john/pdfcrowd.crt")
-
Custom CA certificate path:
def setClientCertificatePassword(self, password)
A password for PKCS12 file with a client certificate if it is needed.
- Parameter:
-
-
password
-
- Example:
-
-
PKCS12 certificate password:
setClientCertificatePassword("123456")
-
PKCS12 certificate password:
Tweaks
Expert options for fine-tuning output.
def setMaxLoadingTime(self, max_time)
Set the maximum time to load the page and its resources. After this time, all requests will be considered successful. This can be useful to ensure that the conversion does not timeout. Use this method if there is no other way to fix page loading.
- Availability:
- API client >= 5.15.0, converter >= 20.10. See versioning.
- Parameter:
-
-
max_time
(int) - The number of seconds to wait.- Constraint:
-
- The accepted range is 10-30.
-
def setConverterUserAgent(self, agent)
Specifies the User-Agent HTTP header that will be used by the converter when a request is made to the converted web page.
- Availability:
- API client >= 6.4.0 See versioning.
- Parameter:
-
-
agent
- The user agent.- Allowed Values:
-
-
chrome-desktop
— The user-agent for desktop chrome corresponding to the converter used. -
chrome-mobile
— The user-agent for mobile chrome corresponding to the converter used. -
latest-chrome-desktop
— The user-agent of the recently released Chrome browser on desktops. -
latest-chrome-mobile
— The user-agent of the recently released Chrome browser on mobile devices. -
custom string
— A custom string for the user agent.
-
- Default:
-
latest-chrome-desktop
-
- Examples:
-
-
Mimic the recent chrome on mobiles:
setConverterUserAgent("latest-chrome-mobile")
-
Mimic Safari 18.0 browser:
setConverterUserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 14_7_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.0 Safari/605.1.15")
-
Mimic the recent chrome on mobiles:
API Client Options
def setConverterVersion(self, version)
Set the converter version. Different versions may produce different output. Choose which one provides the best output for your case.
- Availability:
- API client >= 5.0.0. See versioning.
- Parameter:
-
-
version
- The version identifier.- Allowed Values:
-
-
24.04
— Version 24.04. -
20.10
— Version 20.10. -
18.10
— Version 18.10. -
latest
— Version 20.10 is used.
-
- Default:
-
24.04
-
def setUseHttp(self, value)
Specify whether to use HTTP or HTTPS when connecting to the PDFCrowd API.
- Parameter:
-
-
value
(bool) - Set toTrue
to use HTTP.- Default:
-
False
-
def setClientUserAgent(self, agent)
Specifies the User-Agent HTTP header that the client library will use when interacting with the API.
- Availability:
- API client >= 6.4.0 See versioning.
- Parameter:
-
-
agent
- The user agent string.
-
def setUserAgent(self, agent)
Set a custom user agent HTTP header. It can be useful if you are behind a proxy or a firewall.
- Parameter:
-
-
agent
- The user agent string.- Default:
-
pdfcrowd_python_client/6.5.2 (https://pdfcrowd.com)
-
def setProxy(self, host, port, user_name, password)
Specifies an HTTP proxy that the API client library will use to connect to the internet.
- Parameters:
-
-
host
- The proxy hostname. -
port
(int) - The proxy port. -
user_name
- The username. -
password
- The password.
-
def setRetryCount(self, count)
Specifies the number of automatic retries when the 502 or 503 HTTP status code is received. The status code indicates a temporary network issue. This feature can be disabled by setting to 0.
- Parameter:
-
-
count
(int) - Number of retries.- Default:
-
1
-
- Example:
-
-
Retry failed requests three times:
setRetryCount(3)
-
Retry failed requests three times: