PDF to Text / HTTP API Reference

Conversion Input

url
The address of the PDF to convert.
Constraint:
  • The supported protocols are http:// and https://.
file
The path to a local file to convert.
Constraint:
  • The file must exist and not be empty.
data
Convert raw data.

Conversion Format

input_format
The format of input file.
Allowed values:
  • pdf
output_format
The format of the output file.
Allowed values:
  • txt
Default: txt

Response

output_name
The file name of the created file (max 180 chars). If not specified then the name is auto-generated.
content_disposition
The value of the Content-Disposition HTTP header sent in the response.
Allowed values:
  • attachment
    Forces the browser to pop up a Save As dialog.
  • inline
    The browser will open the result file in the browser window.
Default: attachment

General Options

pdf_password
The password to open the encrypted PDF file.
Set the page range to print.
Constraint:
  • A comma separated list of page numbers or ranges.
Examples:
  • Just the second page is printed.
    2
  • The first and the third page are printed.
    1,3
  • Everything except the first page is printed.
    2-
  • Just first 3 pages are printed.
    -3
  • Pages 3, 6, 7, 8 and 9 are printed.
    3,6-9
no_layout
Ignore the original PDF layout.
Allowed values:
  • true, 1 or on
  • false, 0 or off
Default: false
eol
The end-of-line convention for the text output.
Allowed values:
  • unix
    Unix convension "LF" is used.
  • dos
    Dos convension "CR LF" is used.
  • mac
    Mac convension "CR" is used.
Default: unix
page_break_mode
Specify the page break mode for the text output.
Allowed values:
  • none
    No page breaks are inserted.
  • default
    The standard page break code "FF" is used.
  • custom
    A custom page break is used.
Default: none
custom_page_break
Specify the custom page break.
Examples:
  • END OF PAGE
  • ----my page break----
paragraph_mode
Specify the paragraph detection mode.
Allowed values:
  • none
    No paragraph detection.
  • bounding-box
    Paragraph detection based on line bounding boxes.
  • characters
    Paragraph detection based on the number of characters in the line.
Default: none
line_spacing_threshold
Set the maximum line spacing when the paragraph detection mode is enabled.
Constraint:
  • The value must be a positive integer percentage.
Default: 10%
remove_hyphenation
Remove the hyphen character from the end of lines.
Allowed values:
  • true, 1 or on
  • false, 0 or off
Default: false
remove_empty_lines
Remove empty lines from the text output.
Allowed values:
  • true, 1 or on
  • false, 0 or off
Default: false
crop_area_x
Set the top left X coordinate of the crop area in points.
Constraint:
  • Must be a positive integer number or 0.
Default: 0
Example:
  • 100
crop_area_y
Set the top left Y coordinate of the crop area in points.
Constraint:
  • Must be a positive integer number or 0.
Default: 0
Example:
  • 100
crop_area_width
Set the width of the crop area in points.
Constraint:
  • Must be a positive integer number or 0.
Default: PDF page width.
Example:
  • 100
crop_area_height
Set the height of the crop area in points.
Constraint:
  • Must be a positive integer number or 0.
Default: PDF page height.
Example:
  • 100

Miscellaneous

debug_log
Turn on the debug logging. Details about the conversion are stored in the debug log. The URL of the log is returned in the x-pdfcrowd-debug-log response header or available in conversion statistics.
Allowed values:
  • true, 1 or on
  • false, 0 or off
Default: false
tag
Tag the conversion with a custom value. The tag is used in conversion statistics. A value longer than 32 characters is cut off.
Example:
  • client-1234
http_proxy
A proxy server used by Pdfcrowd conversion process for accessing the source URLs with HTTP scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.
Constraint:
  • The value must have format DOMAIN_OR_IP_ADDRESS:PORT.
Examples:
  • myproxy.com:8080
  • 113.25.84.10:33333
https_proxy
A proxy server used by Pdfcrowd conversion process for accessing the source URLs with HTTPS scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.
Constraint:
  • The value must have format DOMAIN_OR_IP_ADDRESS:PORT.
Examples:
  • myproxy.com:443
  • 113.25.84.10:44333