HTML to Image API - Command Line Tool

Convert web pages and HTML documents to various image formats from the command line using the Pdfcrowd API v2.

Installation

Install the application from PyPI
 $ pip install pdfcrowd

You can learn more about other install options here.

Authentication

Authentication is needed in order to use the Pdfcrowd API. The credentials used for accessing the API are your Pdfcrowd username and the API key. You can sign up for the Pdfcrowd API here.

Examples

Convert a web page to a PNG file
html2image -user-name "your_username" -api-key "your_apikey" \
    -output-format "png" \
    "http://www.example.com" > example.png
Convert a local HTML file to a PNG file
html2image -user-name "your_username" -api-key "your_apikey" \
    -output-format "png" \
    "/path/to/MyLayout.html" > MyLayout.png
Convert a string containing HTML to a PNG file
echo -n "<html><body><h1>Hello World!</h1></body></html>" | \
    html2image -user-name "your_username" -api-key "your_apikey" \
    -output-format "png" - > HelloWorld.png

Tips & Tricks

html2image Manual

Conversion from HTML to image.

usage: html2image [options] source

Conversion from HTML to image.

positional arguments:
  source                Source to be converted. It can be URL, path to a local
                        file or '-' to use stdin as an input text.

optional arguments:
  -user-name USER_NAME  Your user name at pdfcrowd.com.
  -api-key API_KEY      Your API key at pdfcrowd.com.
  -output-format OUTPUT_FORMAT
                        The format of the output file. Allowed values are png,
                        jpg, gif, tiff, bmp, ico, ppm, pgm, pbm, pnm, psb,
                        pct, ras, tga, sgi, sun, webp.
  -no-background        Do not print the background graphics.
  -disable-javascript   Do not execute JavaScript.
  -disable-image-loading
                        Do not load images.
  -disable-remote-fonts
                        Disable loading fonts from remote sources.
  -block-ads            Try to block ads. Enabling this option can produce
                        smaller output and speed up the conversion.
  -default-encoding DEFAULT_ENCODING
                        Set the default HTML content text encoding. The text
                        encoding of the HTML content.
  -http-auth HTTP_AUTH  Set credentials to access HTTP base authentication
                        protected websites. HTTP_AUTH must contain 2 values
                        separated by a semicolon. Set the HTTP authentication
                        user name. Set the HTTP authentication password.
  -use-print-media      Use the print version of the page if available (@media
                        print).
  -no-xpdfcrowd-header  Do not send the X-Pdfcrowd HTTP header in Pdfcrowd
                        HTTP requests.
  -cookies COOKIES      Set cookies that are sent in Pdfcrowd HTTP requests.
                        The cookie string.
  -verify-ssl-certificates
                        Do not allow insecure HTTPS connections.
  -fail-on-main-url-error
                        Abort the conversion if the main URL HTTP status code
                        is greater than or equal to 400.
  -fail-on-any-url-error
                        Abort the conversion if any of the sub-request HTTP
                        status code is greater than or equal to 400 or if some
                        sub-requests are still pending. See details in a debug
                        log.
  -custom-javascript CUSTOM_JAVASCRIPT
                        Run a custom JavaScript after the document is loaded.
                        The script is intended for post-load DOM manipulation
                        (add/remove elements, update CSS, ...). String
                        containing a JavaScript code. The string must not be
                        empty.
  -custom-http-header CUSTOM_HTTP_HEADER
                        Set a custom HTTP header that is sent in Pdfcrowd HTTP
                        requests. A string containing the header name and
                        value separated by a colon.
  -javascript-delay JAVASCRIPT_DELAY
                        Wait the specified number of milliseconds to finish
                        all JavaScript after the document is loaded. The
                        maximum value is determined by your API license. The
                        number of milliseconds to wait. Must be a positive
                        integer number or 0.
  -element-to-convert ELEMENT_TO_CONVERT
                        Convert only the specified element from the main
                        document and its children. The element is specified by
                        one or more CSS selectors. If the element is not
                        found, the conversion fails. If multiple elements are
                        found, the first one is used. One or more CSS
                        selectors separated by commas. The string must not be
                        empty.
  -element-to-convert-mode ELEMENT_TO_CONVERT_MODE
                        Specify the DOM handling when only a part of the
                        document is converted. Allowed values are cut-out,
                        remove-siblings, hide-siblings.
  -wait-for-element WAIT_FOR_ELEMENT
                        Wait for the specified element in a source document.
                        The element is specified by one or more CSS selectors.
                        The element is searched for in the main document and
                        all iframes. If the element is not found, the
                        conversion fails. Your API license defines the maximum
                        wait time by "Max Delay" parameter. One or more CSS
                        selectors separated by commas. The string must not be
                        empty.
  -screenshot-width SCREENSHOT_WIDTH
                        Set the output image width in pixels. The value must
                        be in a range 96-7680.
  -screenshot-height SCREENSHOT_HEIGHT
                        Set the output image height in pixels. If it's not
                        specified, actual document height is used. Must be a
                        positive integer number.
  -debug-log            Turn on the debug logging. Details about the
                        conversion are stored in the debug log.
  -tag TAG              Tag the conversion with a custom value. The tag is
                        used in conversion statistics. A value longer than 32
                        characters is cut off. A string with the custom tag.
  -use-http             Specifies if the client communicates over HTTP or
                        HTTPS with Pdfcrowd API.
  -user-agent USER_AGENT
                        Set a custom user agent HTTP header. It can be usefull
                        if you are behind some proxy or firewall. The user
                        agent string.
  -proxy PROXY          Specifies an HTTP proxy that the API client library
                        will use to connect to the internet. PROXY must
                        contain 4 values separated by a semicolon. The proxy
                        hostname. The proxy port. The username. The password.
  -retry-count RETRY_COUNT
                        Specifies the number of retries when the 502 HTTP
                        status code is received. The 502 status code indicates
                        a temporary network issue. This feature can be
                        disabled by setting to 0. Number of retries wanted.

produced by: www.pdfcrowd.com

Troubleshooting

  • Check API Status Codes in case of the error code is returned.
  • You can use -debug-log to get detailed info about the conversion, such as conversion errors, time, console output.
  • You can use our JavaScript library to resolve rendering problems, such as missing content or blank pages.
    Just use -custom-javascript with libPdfcrowd.highlightHtml(borders, backgrounds, labels, noZeroSpace) method call to visualize all HTML elements. See example.
  • Take a look at the FAQ section.