PDF to Text / Python Documentation

Overview

This page serves as a guide for using the PDFCrowd API to extract text from PDF in Python applications.

Installation

You can install the client library from PyPI

pip install pdfcrowd

Download pdfcrowd-6.6.0-python.zip.
Extract the archive and run the following commands:
```
cd pdfcrowd-6.6.0
pip install .
```

Clone pdfcrowd-python from Github and install the library:

git clone https://github.com/pdfcrowd/pdfcrowd-python
cd pdfcrowd-python
pip install .

Quick Start

Below are Python examples to help you quickly get started with the API. Explore our additional examples for more insights.

import pdfcrowd
import sys

try:
    # Create an API client instance.
    client = pdfcrowd.PdfToTextClient('demo', 'demo')

    # Run the conversion and save the result to a file.
    client.convertFileToFile('/path/to/invoice.pdf', 'invoice.txt')

except pdfcrowd.Error as why:
    sys.stderr.write('PDFCrowd Error: {}\n'.format(why))
    raise

import pdfcrowd
import sys

try:
    # Create an API client instance.
    client = pdfcrowd.PdfToTextClient('demo', 'demo')

    # Run the conversion and save the result to a file.
    client.convertUrlToFile('https://pdfcrowd.com/static/pdf/apisamples/invoice.pdf', 'invoice.txt')

except pdfcrowd.Error as why:
    sys.stderr.write('PDFCrowd Error: {}\n'.format(why))
    raise

import pdfcrowd
import sys

try:
    # Create an API client instance.
    client = pdfcrowd.PdfToTextClient('demo', 'demo')

    # Run the conversion and save the result to a file.
    client.convertRawDataToFile(open('/path/to/hello_world.pdf', 'rb').read(), 'invoice.txt')

except pdfcrowd.Error as why:
    sys.stderr.write('PDFCrowd Error: {}\n'.format(why))
    raise

Authentication

The API requires authentication using your PDFCrowd username and API key. To get started quickly, you can use these demo credentials for testing:

Username: demo
API key: demo

Use your own PDFCrowd username and API key for production. You can obtain API credentials from a free trial or API license.

Error Handling

It is recommended that you implement error handling to catch errors the API may return. Effective error handling is vital as it ensures application stability and provides clearer diagnostics. See the example code below for guidance on implementing error handling, and refer to this list of status codes for more information.

try: 
    # Call the API 
except pdfcrowd.Error as why: 
    # Log the complete error
    sys.stderr.write('PDFCrowd Error: {}\n'.format(why))

    # Log the HTTP status code
    sys.stderr.write('Status Code: {}\n'.format(why.getStatusCode()))

    # Log the reason code
    sys.stderr.write('Reason Code: {}\n'.format(why.getReasonCode()))

    # Log the error message
    sys.stderr.write('Error Message: {}\n'.format(why.getMessage()))

    # Log the documentation link
    sys.stderr.write('Documentation Link: {}\n'.format(why.getDocumentationLink()))

Troubleshooting

If you are receiving an error, refer to the API Status Codes for more information.
Use setDebugLog() and getDebugLogUrl() to obtain detailed information about the conversion process, including load errors, load times, browser console output, etc.
Consult the FAQ for answers to common questions.
Contact us if you need assistance or if there is a feature you are missing.

Method Reference

Refer to the PDF to Text Python Reference for a description of all API methods.