PDF to Text / HTTP API Reference

Conversion Input

url

The address of the PDF to convert.

Constraint:
  • Supported protocols are http:// and https://.

file

The path to a local file to convert.

Constraint:
  • The file must exist and not be empty.

data

Convert raw data.

Conversion Format

input_format

The format of input file.

Allowed Values:
  • pdf

output_format

The format of the output file.

Default:
txt
Allowed Values:
  • txt

Response

output_name

The file name of the created file (max 180 chars). If not specified then the name is auto-generated.

content_disposition

The value of the Content-Disposition HTTP header sent in the response.

Default:
attachment
Allowed Values:
  • attachment — Forces the browser to pop up a Save As dialog.
  • inline — The browser will open the result file in the browser window.

General Options

pdf_password

The password to open the encrypted PDF file.

no_layout

Ignore the original PDF layout.

Default:
false
Allowed Values:
  • true, 1 or on
  • false, 0 or off

eol

The end-of-line convention for the text output.

Default:
unix
Allowed Values:
  • unix — Unix convension "LF" is used.
  • dos — Dos convension "CR LF" is used.
  • mac — Mac convension "CR" is used.

page_break_mode

Specify the page break mode for the text output.

Default:
none
Allowed Values:
  • none — No page breaks are inserted.
  • default — The standard page break code "FF" is used.
  • custom — A custom page break is used.

custom_page_break

Specify the custom page break.

Examples:
  • Clear text between pages: END OF PAGE
  • Visual separator with line break
    ----my page break----
    

paragraph_mode

Specify the paragraph detection mode.

Default:
none
Allowed Values:
  • none — No paragraph detection.
  • bounding-box — Paragraph detection based on line bounding boxes.
  • characters — Paragraph detection based on the number of characters in the line.

line_spacing_threshold

Set the maximum line spacing when the paragraph detection mode is enabled.

Constraint:
  • The value must be a positive integer percentage.
Default:
10%

remove_hyphenation

Remove the hyphen character from the end of lines.

Default:
false
Allowed Values:
  • true, 1 or on
  • false, 0 or off

remove_empty_lines

Remove empty lines from the text output.

Default:
false
Allowed Values:
  • true, 1 or on
  • false, 0 or off

crop_area_x

Set the top left X coordinate of the crop area in points.

Constraint:
  • Must be a positive integer or 0.
Example:
  • Start extraction at 1.4 inches from left: 100

crop_area_y

Set the top left Y coordinate of the crop area in points.

Constraint:
  • Must be a positive integer or 0.
Example:
  • Start extraction at 1.4 inches from top: 100

crop_area_width

Set the width of the crop area in points.

Constraint:
  • Must be a positive integer or 0.
Default:
PDF page width.
Example:
  • Extract narrow 1.4 inch width: 100

crop_area_height

Set the height of the crop area in points.

Constraint:
  • Must be a positive integer or 0.
Default:
PDF page height.
Example:
  • Extract small 1.4 inch height: 100

Miscellaneous

debug_log

Turn on the debug logging. Details about the conversion are stored in the debug log. The URL of the log is returned in the x-pdfcrowd-debug-log response header or available in conversion statistics.

Default:
false
Allowed Values:
  • true, 1 or on
  • false, 0 or off

tag

Tag the conversion with a custom value. The tag is used in conversion statistics. A value longer than 32 characters is cut off.

Example:
  • Track job in analytics: client-1234

http_proxy

A proxy server used by the conversion process for accessing the source URLs with HTTP scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.

Constraint:
  • The value must have format DOMAIN_OR_IP_ADDRESS:PORT.
Examples:
  • Corporate proxy server: myproxy.com:8080
  • Direct IP proxy connection: 113.25.84.10:33333

https_proxy

A proxy server used by the conversion process for accessing the source URLs with HTTPS scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.

Constraint:
  • The value must have format DOMAIN_OR_IP_ADDRESS:PORT.
Examples:
  • Secure proxy for HTTPS: myproxy.com:443
  • Direct secure proxy IP: 113.25.84.10:44333