PDF to Text Python Examples

This page contains various examples of using the PDF to Text API in Python. The examples are complete and fully functional. Read more about how to convert PDF to Text in Python.

Basic examples

Basic examples

Convert a local PDF file to a text file

import pdfcrowd
import sys

try:
    # create the API client instance
    client = pdfcrowd.PdfToTextClient('demo', 'ce544b6ea52a5621fb9d55f8b542d14d')

    # run the conversion and write the result to a file
    client.convertFileToFile('/path/to/invoice.pdf', 'invoice.txt')
except pdfcrowd.Error as why:
    # report the error
    sys.stderr.write('Pdfcrowd Error: {}\n'.format(why))

    # rethrow or handle the exception
    raise

Convert a local PDF file to in-memory text

import pdfcrowd
import sys

try:
    # create the API client instance
    client = pdfcrowd.PdfToTextClient('demo', 'ce544b6ea52a5621fb9d55f8b542d14d')

    # run the conversion and store the result into the "txt" variable
    txt = client.convertFile('/path/to/invoice.pdf')

    # at this point the "txt" variable contains TXT raw data and
    # can be sent in an HTTP response, saved to a file, etc.
except pdfcrowd.Error as why:
    # report the error
    sys.stderr.write('Pdfcrowd Error: {}\n'.format(why))

    # rethrow or handle the exception
    raise

Convert a local PDF file and write the resulting text to an output stream

import pdfcrowd
import sys

try:
    # create the API client instance
    client = pdfcrowd.PdfToTextClient('demo', 'ce544b6ea52a5621fb9d55f8b542d14d')

    # create an output stream for the conversion result
    output_stream = open('invoice.txt', 'wb')

    # run the conversion and write the result into the output stream
    client.convertFileToStream('/path/to/invoice.pdf', output_stream)

    # close the output stream
    output_stream.close()
except pdfcrowd.Error as why:
    # report the error
    sys.stderr.write('Pdfcrowd Error: {}\n'.format(why))

    # rethrow or handle the exception
    raise

Convert url with PDF file to a text file

import pdfcrowd
import sys

try:
    # create the API client instance
    client = pdfcrowd.PdfToTextClient('demo', 'ce544b6ea52a5621fb9d55f8b542d14d')

    # run the conversion and write the result to a file
    client.convertUrlToFile('https://pdfcrowd.com/static/pdf/apisamples/invoice.pdf', 'invoice.txt')
except pdfcrowd.Error as why:
    # report the error
    sys.stderr.write('Pdfcrowd Error: {}\n'.format(why))

    # rethrow or handle the exception
    raise

Convert url with PDF file to in-memory text

import pdfcrowd
import sys

try:
    # create the API client instance
    client = pdfcrowd.PdfToTextClient('demo', 'ce544b6ea52a5621fb9d55f8b542d14d')

    # run the conversion and store the result into the "txt" variable
    txt = client.convertUrl('https://pdfcrowd.com/static/pdf/apisamples/invoice.pdf')

    # at this point the "txt" variable contains TXT raw data and
    # can be sent in an HTTP response, saved to a file, etc.
except pdfcrowd.Error as why:
    # report the error
    sys.stderr.write('Pdfcrowd Error: {}\n'.format(why))

    # rethrow or handle the exception
    raise

Convert url with PDF file and write the resulting text to an output stream

import pdfcrowd
import sys

try:
    # create the API client instance
    client = pdfcrowd.PdfToTextClient('demo', 'ce544b6ea52a5621fb9d55f8b542d14d')

    # create an output stream for the conversion result
    output_stream = open('invoice.txt', 'wb')

    # run the conversion and write the result into the output stream
    client.convertUrlToStream('https://pdfcrowd.com/static/pdf/apisamples/invoice.pdf', output_stream)

    # close the output stream
    output_stream.close()
except pdfcrowd.Error as why:
    # report the error
    sys.stderr.write('Pdfcrowd Error: {}\n'.format(why))

    # rethrow or handle the exception
    raise

Convert an in-memory PDF to a text file

import pdfcrowd
import sys

try:
    # create the API client instance
    client = pdfcrowd.PdfToTextClient('demo', 'ce544b6ea52a5621fb9d55f8b542d14d')

    # run the conversion and write the result to a file
    client.convertRawDataToFile(open('/path/to/hello_world.pdf', 'rb').read(), 'invoice.txt')
except pdfcrowd.Error as why:
    # report the error
    sys.stderr.write('Pdfcrowd Error: {}\n'.format(why))

    # rethrow or handle the exception
    raise

Convert an in-memory PDF to in-memory text

import pdfcrowd
import sys

try:
    # create the API client instance
    client = pdfcrowd.PdfToTextClient('demo', 'ce544b6ea52a5621fb9d55f8b542d14d')

    # run the conversion and store the result into the "txt" variable
    txt = client.convertRawData(open('/path/to/hello_world.pdf', 'rb').read())

    # at this point the "txt" variable contains TXT raw data and
    # can be sent in an HTTP response, saved to a file, etc.
except pdfcrowd.Error as why:
    # report the error
    sys.stderr.write('Pdfcrowd Error: {}\n'.format(why))

    # rethrow or handle the exception
    raise

Convert an in-memory PDF and write the resulting text to an output stream

import pdfcrowd
import sys

try:
    # create the API client instance
    client = pdfcrowd.PdfToTextClient('demo', 'ce544b6ea52a5621fb9d55f8b542d14d')

    # create an output stream for the conversion result
    output_stream = open('invoice.txt', 'wb')

    # run the conversion and write the result into the output stream
    client.convertRawDataToStream(open('/path/to/hello_world.pdf', 'rb').read(), output_stream)

    # close the output stream
    output_stream.close()
except pdfcrowd.Error as why:
    # report the error
    sys.stderr.write('Pdfcrowd Error: {}\n'.format(why))

    # rethrow or handle the exception
    raise

Get info about the current conversion

import pdfcrowd
import sys

try:
    # create the API client instance
    client = pdfcrowd.PdfToTextClient('demo', 'ce544b6ea52a5621fb9d55f8b542d14d')

    # configure the conversion
    client.setDebugLog(True)
    client.setPageBreakMode('default')

    # run the conversion and write the result to a file
    client.convertFileToFile('/path/to/invoice.pdf', 'invoice.txt')
    
    # print URL to the debug log
    print('Debug log url: {}'.format(client.getDebugLogUrl()))
    
    # print the number of available conversion credits in your account
    print('Remaining credit count: {}'.format(client.getRemainingCreditCount()))
    
    # print the number of credits consumed by the conversion
    print('Consumed credit count: {}'.format(client.getConsumedCreditCount()))
    
    # print the unique ID of the conversion
    print('Job id: {}'.format(client.getJobId()))
    
    # print the total number of pages in the output document
    print('Page count: {}'.format(client.getPageCount()))
    
    # print the size of the output in bytes
    print('Output size: {}'.format(client.getOutputSize()))
except pdfcrowd.Error as why:
    # report the error
    sys.stderr.write('Pdfcrowd Error: {}\n'.format(why))

    # rethrow or handle the exception
    raise

Advanced examples

Template rendering Examples