HTML to PDF Online API

The Pdfcrowd online API is a professional solution that lets you create PDF from web pages and raw HTML code in your applications. The API is easy to use and the integration takes only a few lines of code.

PDF Samples

Here are some samples created with the API. Click a thumbnail to open the PDF.

Status report JavaScript vector chart Wikipedia page Invoice Newsletter

API Client Libraries Documentation & Downloads

PHP Client Library Java Client Library DotNET Client Library
Python Client Library Ruby Client Library Others

Code Examples

Click your favorite language to see examples of how to create PDF from a web page, an HTML file, and an HTML string:


This code converts a web page and sends the generated PDF as an HTTP response:
<?php
require 'pdfcrowd.php';

try
{   
    // create an API client instance
    $client = new Pdfcrowd("username", "apikey");

    // convert a web page and store the generated PDF into a $pdf variable
    $pdf = $client->convertURI('http://www.google.com/');

    // set HTTP response headers
    header("Content-Type: application/pdf");
    header("Cache-Control: no-cache");
    header("Accept-Ranges: none");
    header("Content-Disposition: attachment; filename=\"google_com.pdf\"");

    // send the generated PDF 
    echo $pdf;
}
catch(PdfcrowdException $why)
{
    echo "Pdfcrowd Error: " . $why;
}
?>
You can also convert raw HTML code, just use the convertHtml() method instead of convertURI():
    $pdf = $client->convertHtml("<head></head><body>My HTML Layout</body>");
The API lets you also convert a local HTML file:
    $pdf = $client->convertFile("/path/to/MyLayout.html");
You can save the generated PDF to a file:
    $out_file = fopen("document.pdf", "wb");
    $client->convertHtml("<head></head><body>My HTML Layout</body>", $out_file);
    fclose($out_file);

To learn more, see the Pdfcrowd API PHP client documentation.

using System;
using System.IO;

public class PdfcrowdTest
{
  static void Main() {
    try 
    {
      FileStream fileStream;  

      // create an API client instance
      pdfcrowd.Client client = new pdfcrowd.Client("username", "apikey");

      // convert a web page and save the PDF to a file
      fileStream = new FileStream("google_com.pdf", FileMode.CreateNew);
      client.convertURI("http://www.google.com", fileStream);
      fileStream.Close();

      // convert an HTML string and store the PDF into a memory stream
      MemoryStream memStream = new MemoryStream();
      string html = "<head></head><body>My HTML Layout</body>";
      client.convertHtml(html, memStream);

      // convert an HTML file
      fileStream = new FileStream("file.pdf", FileMode.CreateNew);
      client.convertFile("c:/local/file.html", fileStream);
      fileStream.Close();

      // retrieve the number of tokens in your account
      int ntokens = client.numTokens();
    }
    catch(pdfcrowd.Error why) {
      System.Console.WriteLine(why.ToString());
    }
  }
}

To learn more, see the Pdfcrowd API .NET client documentation.


The following code converts a web page and sends the generated PDF as an HTTP response:
<%-- file: PdfGenerator.aspx --%>
<%@ Page Language="C#" CodeFile="PdfGenerator.aspx.cs" Inherits="Website.PdfGenerator"
         AutoEventWireup="true" %>
// file: PdfGenerator.aspx.cs
using System;
using System.IO;

namespace Website
{
  public partial class PdfGenerator : System.Web.UI.Page
  {
    protected void Page_Load(object sender, EventArgs e)
    {
      System.Web.HttpResponse Response = System.Web.HttpContext.Current.Response;
      try
      {
          // create an API client instance
          pdfcrowd.Client client = new pdfcrowd.Client("username", "apikey");

          // convert a web page and write the generated PDF to a memory stream
          MemoryStream Stream = new MemoryStream();
          client.convertURI("http://www.google.com", Stream);

          // set HTTP response headers
          Response.Clear();
          Response.AddHeader("Content-Type", "application/pdf");
          Response.AddHeader("Cache-Control", "no-cache");
          Response.AddHeader("Accept-Ranges", "none");
          Response.AddHeader("Content-Disposition", "attachment; filename=google_com.pdf");

          // send the generated PDF
          Stream.WriteTo(Response.OutputStream);
          Stream.Close();
          Response.Flush();
          Response.End();
      }
      catch(pdfcrowd.Error why)
      {
          Response.Write(why.ToString());
      }
    }
  }
}
You can also convert raw HTML code, just use the convertHtml() method instead of convertURI():
    client.convertHtml("<head></head><body>My HTML Layout</body>", Stream);
The API lets you also convert a local HTML file:
    client.convertFile("c:/MyLayout.html", Stream);

To learn more, see the Pdfcrowd API .NET client documentation.


The following code converts a web page and sends the generated PDF as an HTTP response:
<%-- file: PdfGenerator.aspx --%>
<%@ Page Language="VB" CodeFile="PdfGenerator.aspx.vb" Inherits="Website.PdfGenerator"
         AutoEventWireup="true" %>
' file: PdfGenerator.aspx.vb
Imports System
Imports System.IO

Namespace Website
  Public Partial Class PdfGenerator
      Inherits System.Web.UI.Page
      Protected Sub Page_Load(ByVal sender As Object, ByVal e As EventArgs)
        Dim Response As System.Web.HttpResponse = System.Web.HttpContext.Current.Response
        Try
            ' create an API client instance
            Dim client As New pdfcrowd.Client("username", "apikey")

            ' convert a web page and write the generated PDF to a memory stream
            Dim Stream As New MemoryStream
            client.convertURI("http://www.google.com", Stream)

            ' set HTTP response headers
            Response.Clear() 
            Response.AddHeader("Content-Type", "application/pdf")
            Response.AddHeader("Cache-Control", "no-cache")
            Response.AddHeader("Accept-Ranges", "none")
            Response.AddHeader("Content-Disposition", "attachment; filename=google_com.pdf") 

            ' send the generated PDF
            Stream.WriteTo(Response.OutputStream)
            Stream.Close()
            Response.Flush() 
            Response.End()
        Catch why As pdfcrowd.Error
            Response.Write(why.ToString())
        End Try
      End Sub
  End Class
End Namespace
You can also convert raw HTML code, just use the convertHtml() method instead of convertURI():
    client.convertHtml("<head></head><body>My HTML Layout</body>", Stream)
The API lets you also convert a local HTML file:
    client.convertFile("c:/MyLayout.html", Stream)

To learn more, see the Pdfcrowd API .NET client documentation.

import com.pdfcrowd.*;
import java.io.*;

public class PdfcrowdTest {
    public static void main(String[] args) {
        try 
        {
            FileOutputStream fileStream;     
 
            // create an API client instance
            Client client = new Client("username", "apikey");

            // convert a web page and save the PDF to a file
            fileStream = new FileOutputStream("google_com.pdf");
            client.convertURI("http://www.google.com/", fileStream);
            fileStream.close();

            // convert an HTML string and store the PDF into a byte array
            ByteArrayOutputStream memStream  = new ByteArrayOutputStream();
            String html = "<head></head><body>My HTML Layout</body>";
            client.convertHtml(html, memStream);

            // convert an HTML file
            fileStream = new FileOutputStream("file.pdf");
            client.convertFile("/path/to/local/file.html", fileStream);
            fileStream.close();

            // retrieve the number of tokens in your account
            Integer ntokens = client.numTokens();
        }
        catch(PdfcrowdError why) {
            System.err.println(why.getMessage());
        }
        catch(IOException exc) {
            // handle the exception
        }
    }
}

To learn more, see the Pdfcrowd API Java client documentation.

import pdfcrowd

try:
    # create an API client instance
    client = pdfcrowd.Client("username", "apikey")

    # convert a web page and store the generated PDF into a pdf variable
    pdf = client.convertURI('http://www.google.com')

    # convert an HTML string and save the result to a file
    output_file = open('html.pdf', 'wb')
    html="<head></head><body>My HTML Layout</body>"
    client.convertHtml(html, output_file)
    output_file.close()

    # convert an HTML file
    output_file = open('file.pdf', 'wb')
    client.convertFile('/path/to/MyLayout.html', output_file)
    output_file.close()


except pdfcrowd.Error, why:
    print 'Failed:', why

To learn more, see the Pdfcrowd API Python client documentation.


The following code shows how to convert a web page in a view function:
import pdfcrowd
from django.http import HttpResponse

def generate_pdf_view(request):
    try:
        # create an API client instance
        client = pdfcrowd.Client("username", "apikey")

        # convert a web page and store the generated PDF to a variable
        pdf = client.convertURI("http://www.google.com")

         # set HTTP response headers
        response = HttpResponse(mimetype="application/pdf")
        response["Cache-Control"] = "no-cache"
        response["Accept-Ranges"] = "none"
        response["Content-Disposition"] = "attachment; filename=google_com.pdf"

        # send the generated PDF
        response.write(pdf)
    except pdfcrowd.Error, why:
        response = HttpResponse(mimetype="text/plain")
        response.write(why)
    return response
You can also convert raw HTML code, just use the convertHtml() method instead of convertURI():
    pdf = client.convertHtml("<head></head><body>My HTML Layout</body>")
The API lets you also convert a local HTML file:
    pdf = client.convertFile("/path/to/MyLayout.html")

To learn more, see the Pdfcrowd API Python client documentation.

require 'rubygems'
require 'pdfcrowd'

begin
    # create an API client instance
    client = Pdfcrowd::Client.new("username", "apikey")

    # convert a web page and store the generated PDF into a pdf variable
    pdf = client.convertURI('http://www.google.com')

    # convert an HTML string and save the result to a file
    html="<head></head><body>My HTML Layout</body>"
    File.open('html.pdf', 'wb') {|f| client.convertHtml(html, f)}

    # convert an HTML file
    File.open('file.pdf', 'wb') {|f| client.convertFile('/path/to/MyLayout.html', f)}

    # retrieve the number of tokens in your account
    ntokens = client.numTokens()

rescue Pdfcrowd::Error => why
    print 'FAILED: ', why
end

To learn more, see the Pdfcrowd API Ruby client documentation.


The following code shows how to convert a web page in a controller method:
require 'rubygems'
require 'pdfcrowd'

def generatePdf
  begin
    # create an API client instance
    client = Pdfcrowd::Client.new("username", "apikey")

    # convert a web page and store the generated PDF to a variable
    pdf = client.convertURI("http://www.google.com")

    # send the generated PDF
    send_data(pdf, 
              :filename => "google_com.pdf",
              :type => "application/pdf",
              :disposition => "attachment")
  rescue Pdfcrowd::Error => why
    render :text => why
  end
end
You can also convert raw HTML code, just use the convertHtml() method instead of convertURI():
    pdf = client.convertHtml("<head></head><body>My HTML Layout</body>")
The API lets you also convert a local HTML file:
    pdf = client.convertFile("/path/to/MyLayout.html")

To learn more, see the Pdfcrowd API Ruby client documentation.

var pdf = require('pdfcrowd');

// create an API client instance
var client = new pdf.Pdfcrowd("username", "apikey");

// convert an HTML string and send the generated PDF in a HTTP response
client.convertHtml('<html>regular HTML code</html>', pdf.sendHttpResponse(response));

// convert a web page and save it to a file
client.convertURI('http://www.google.com', pdf.saveToFile("google_com.pdf"));

// convert a local HTML file:
client.convertFile('/local/file.html', pdf.saveToFile("file.pdf"));
$ curl -F "username=$username" -F "key=$apikey" -F 'src=http://www.google.com' \
>      http://pdfcrowd.com/api/pdf/convert/uri/ > google_com.pdf

$ curl -F "username=$username" -F "key=$apikey" -F 'src=@index.html' \
>      http://pdfcrowd.com/api/pdf/convert/html/ > file.pdf

$ html_producer | curl -F "username=$username" -F "key=$apikey" -F 'src=<-' \
>      http://pdfcrowd.com/api/pdf/convert/html/ > html.pdf

To learn more, see the REST API documentation.

$ pdfcrowd.sh http://www.google.com > google_com.pdf
$ pdfcrowd.sh /path/to/html_file > html.pdf
$ html_producer | pdfcrowd.sh - > file.pdf

To learn more, see the pdfcrowd.sh documentation.

Features

  • Create PDF from a web page or an uploaded HTML document.
  • Full HTML/CSS2/JavaScript support.
  • Set PDF options such as page dimensions, margins, encryption, permissions, initial view, ...
  • Fully customizable headers and footers.
  • Watermark support.
  • You can upload a zipped HTML document with external resources (images, stylesheets).
  • When security is a concern, the API can be called over HTTPS.

Why Pdfcrowd API?

  • The API is available virtually on any platform that supports HTTP.
  • Easy integration, no third party libraries needed.
  • The API is proven by our customers. Have a look at what they say about our solution.
  • We work very hard to provide high quality support on our forums.
  • 99.9% uptime. The API is fast and reliable, check our uptime report provided by a third party

Testing the integration with your application is free. We will credit your account with enough free conversions so you can make sure that everything works to your satisfaction before going to production.

If you have any questions, please do not hesitate to ask. We will gladly help you on our support forums.

REST API Documentation

An API call is made by sending an HTTP request to the API endpoint with parameters passed as POST data. The API endpoints support both http and https schemes.

Authentication

The following parameters are used for authentication and must be present in every API call.

Parameter Description
username Your username at Pdfcrowd
key API key. Can be found on your account page. Note that the key is regenerated when you change your password.

PDF Creation

A PDF document can be created in one of the following ways. The created PDF is returned in the response body.

Convert a web page
MethodPOST
API Endpointhttp://pdfcrowd.com/api/pdf/convert/uri/
Content Typeapplication/x-www-form-urlencoded
Post Data Mandatory Optional
Convert a local HTML file
MethodPOST
API Endpointhttp://pdfcrowd.com/api/pdf/convert/html/
Content Typemultipart/form-data
Post Data Mandatory
  • src - A path to a local file. The file can be either an HTML document or a .tar.gz, .tar.bz2, or .zip archive which can contain external resources such as stylesheet, images, etc.
  • authentication parameters
Optional
Convert raw HTML code
MethodPOST
Content Typeapplication/x-www-form-urlencoded
API Endpointhttp://pdfcrowd.com/api/pdf/convert/html/
Post Data Mandatory Optional

Common Parameters

The following optional parameters control properties of the generated PDF file.

Page Setup

Parameter Description
width PDF page width in units.
height PDF page height in units. -1 for a single page PDF.
margin_top Top PDF page margin in units.
margin_right Rigth PDF page margin in units.
margin_bottom Bottom PDF page margin in units.
margin_left Left PDF page margin in units.
hmargin Deprecated. Use margin_left and margin_right instead.
vmargin Deprecated. Use margin_top and margin_bottom instead.

Some parameters can be specified in inches (in), millimeters (mm), centimeters (cm), or points (pt). If no units are specified, points are assumed.

Examples: 210mm, 8.5in

Header and Footer

Parameter Description
footer_html Places the specified HTML code inside the page footer. The following variables are expanded:
  • %u - URL to convert.
  • %p - The current page number.
  • %n - Total number of pages.
footer_url Loads HTML code from the specified URL and places it inside the page footer. See footer_html for the list of variables that are expanded.
header_html Places the specified HTML code inside the page header. See footer_html for the list of variables that are expanded.
header_url Loads HTML code from the specified URL and places it inside the page header. See footer_html for the list of variables that are expanded.
header_footer_page_exclude_list A comma seperated list of physical page numbers on which the header a footer are not printed. Negative numbers count backwards from the last page: -1 is the last page, -2 is the last but one page, and so on.
Example: "1,-1" will not print the header and footer on the first and the last page.
page_numbering_offset An offset between physical and logical page numbers. The default value is 0.
Example: if set to "1" then the page numbering will start with 1 on the second page.

HTML options

Parameter Description
no_images Do not print images.
no_backgrounds Do not print backgrounds.
html_zoom HTML zoom in percents. It determines the precision used for rendering of the HTML content. Despite its name, it does not zoom the HTML content. Higher values can improve glyph positioning and can lead to overall better visual appearance of the generated PDF .The default value is 100.
no_javascript Do not run JavaScript in web pages.
no_hyperlinks Do not create hyperlinks in the PDF.
text_encoding The text encoding to use when none is specified in the web page. The default value is utf-8.
use_print_media Use the print CSS media type (if available).

PDF Options

Parameter Description
encrypted Encrypts the PDF. This prevents search engines from indexing the document.
author Sets the PDF author field.
user_pwd Protects the PDF with an user password. When a PDF has an user password, it must be supplied in order to view the document and to perform operations allowed by the access permissions. At most 32 characters.
owner_pwd Protects the PDF with an owner password. Supplying an owner password grants unlimited access to the PDF including changing the passwords and access permissions. At most 32 characters.
no_print Do not allow to print the generated PDF.
no_modify Do not allow to modify the PDF.
no_copy Do not allow to extract text and graphics from the PDF.
page_layout Specifies the initial page layout when the PDF is opened in a viewer:
  • 1 - Single page.
  • 2 - Continuous.
  • 3 - Continuous facing.
initial_pdf_zoom_type Specifies the initial page zoom type when the PDF is opened in a viewer:
  • 1 - Fit width.
  • 2 - Fit height.
  • 3 - Fit page.
  • 4 - Zoom. The zoom is specified by initial_pdf_zoom.
initial_pdf_zoom Specifies the initial page zoom when the PDF is opened in a viewer. Defaults to 100%.
page_mode Specifies the appearance of the PDF when opened:
  • 1 - Neither document outline nor thumbnail images visible.
  • 2 - Thumbnail images visible.
  • 3 - Full-screen mode.
max_pages Prints at most the specified number of pages.
pdf_name The file name of the created PDF (max 180 chars). If not specified then the name is auto-generated.
pdf_scaling_factor The scaling factor used to convert between HTML and PDF. The default value is 1.333 (4/3) which makes the PDF content up to 1/3 larger than HTML.
page_background_color The page background color in RRGGBB hexadecimal format.
transparent_background Do not print the body background. Requires the following CSS rule to be declared:
body {background-color:rgba(255,255,255,0.0);}

Watermark

Parameter Description
watermark_url A public absolute URL of the watermark image (must start either with http:// or https://). The supported formats are PNG and JPEG.
watermark_offset_x The horizontal watermark offset in units. The default value is 0.
watermark_offset_y The vertical watermark offset in units. The default value is 0.
watermark_rotation The watermark rotation in degrees.
watermark_in_background When set then the watermark is be placed in the background. By default, the watermark is placed in the foreground.

Miscellaneous

Parameter Description
fail_on_non200 The conversion request will fail if the converted URL returns 4xx or 5xx HTTP status code.
content_disposition The value of the Content-Disposition HTTP header sent in the response. Allowed values:
  • inline- The browser will open the PDF in the browser window.
  • attachment - Forces the browser to pop up a Save As dialog. This is the default value.
pdfcrowd_logo Insert the Pdfcrowd logo to the footer.

Examples of API calls

The following command converts www.google.com to PDF:

$ curl -F "username=$username" -F "key=$apikey" \
>      -F 'src=http://www.google.com' \
>      http://pdfcrowd.com/api/pdf/convert/uri/ > google_com.pdf

The following command converts the output of some html_producer application to PDF and protects it with an user password:

$ html_producer | curl -F "username=$username" -F "key=$apikey" \
>                      -F 'src=<-' \
>                      -F 'user_pwd=secret' \
>                      http://pdfcrowd.com/api/pdf/convert/html/ > google_com.pdf

The following command converts a local file index.html to PDF and disables printing:

$ curl -F "username=$username" -F "key=$apikey" \
>      -F 'src=@index.html' \
>      -F 'no_print=1' \
>      http://pdfcrowd.com/api/pdf/convert/html/ > google_com.pdf

Note, that the examples above ignore errors. If the API call fails google_com.pdf will contain an error message instead of PDF. For convenience, Pdfcrowd provides a shell wrapper around curl that checks for errors and generally makes the interaction with the API from the command line friendlier.

User Status

Number of remaining tokens
MethodPOST
API Endpointhttp://pdfcrowd.com/api/user/<username>/tokens/
Content Typeapplication/x-www-form-urlencoded
Post Data Mandatory

The response contains the result in plain text.

$ curl -F "username=$username" -F "key=$apikey" "http://pdfcrowd.com/api/user/$username/tokens/"
12399

Return Codes

The API returns the standard HTTP response status codes. These are the most important ones:

Code Description
200 OK The API call succeeded.
400 Bad Request The user sent an invalid request. The body of the response contains an explanation in plain text.
413 Request Entity Too Large See Limitations below.
502 Bad Gateway See Limitations below.
503 Service Unavailable See Limitations below.
510 A non-standard status code indicating that the server couldn't process the request. The body of the response contains an explanation in plain text.

Shell

Pdfcrowd provides a shell script which wraps curl and provides more convenient access to the API from shell.

The script reads options from ~/.pdfcrowd and then from the command line. You can store your default options to ~/.pdfcrowd, one option per line:

$ cat > ~/.pdfcrowd <<!
> -username $username
> -key $apikey
> -width 210mm
> -height 297mm
> !

Now, you can run:

$ pdfcrowd.sh http://www.google.com > google_com.pdf
$ pdfcrowd.sh /path/to/local/file > html.pdf
$ html_producer | pdfcrowd.sh - > file.pdf
$ echo "remaining tokens: $(pdfcrowd.sh @)"

On success, exit status 0 is returned and the result is written to stdout. Otherwise, non-zero exit status is returned and the error message is written to stderr. Run the script with -help to get the list of available options.

Data Security

We understand that your data may be sensitive and confidential and that it is absolutely unacceptable to disclose it or keep unnecessary copies.

We want to be transparent about how we process your data, so here we describe an API conversion request lifecycle:

  1. Your request hits our HTTP server. Depending on the method you use:
    • Any data uploaded with convertHtml or convertFile is temporarily saved to a directory (dirA/) on our server.
    • Any other data is downloaded directly by the user agent during the conversion. The user agent has caching disabled.
  2. Our backend process creates a PDF from your data and saves it temporarily to a directory (dirB/) on our server.
  3. The HTTP server streams the created PDF from dirB/ back to you.
  4. Both dirA/ and dirB/ are scanned every 10 minutes by a script which deletes the old files.

API Limitations

The following limits are applied to ensure fair distribution of capacity.

If you find the limits too restrictive we can provision a private server optimized for your specific needs with the limits lifted. If you would like to discuss this option please contact us.

Rate Limiting

The API returns an HTTP 503 Service unavailable status code in the following cases:

  • The client sends more than 30 requests per minute from a single IP. Note, that a query on the number of tokens counts against the limit.
  • The client sends more than one simultaneous request from a single IP at a time.

    Ideally, you should serialize your API calls. However, we understand that you can't always ensure this in some environments, such as for instance with Google App Engine Task Queue. In such cases it is perfectly acceptable to check for this error and possibly send the failed request again.

Time Limit

If a request takes more than 40 seconds to complete it is cancelled and either of these messages may be returned:

  • 510 - 413 Timed out. Can't load the specified URL.
  • 502 - Sorry, we couldn't process your request.

A typical cause of this error is too many images on an HTML page which take too long to download. Another cause might be a long running JavaScript.

Size Limits

If the size of the uploaded data exceeds 20MB then an HTTP 413 Request entity too large error is returned. You can zip your HTML to avoid this error.

If the size of the downloaded data exceeds 100MB then the an HTTP 502 error code is returned.

Miscellaneous

We return an HTTP 502 error code also when:

  • The converted page loads more than 2,000 sub-resources (images, external stylesheets, etc.)
  • There is more than 8 HTTP redirects when attempting to load a page sub-resource.