PDF to Text / Command Line Reference

Conversion from PDF to text.

usage: pdf2text [options] source
source
Source to be converted. It can be URL, path to a local file or '-' to use stdin as an input text.

Options

General Options

-pdf-password

The password to open the encrypted PDF file.

-no-layout

Ignore the original PDF layout.

Default:
False
Allowed Values:
  • true, 1 or on
  • false, 0 or off

-eol

The end-of-line convention for the text output.

Default:
unix
Allowed Values:
  • unix — Unix convension "LF" is used.
  • dos — Dos convension "CR LF" is used.
  • mac — Mac convension "CR" is used.

-page-break-mode

Specify the page break mode for the text output.

Default:
none
Allowed Values:
  • none — No page breaks are inserted.
  • default — The standard page break code "FF" is used.
  • custom — A custom page break is used.

-custom-page-break

Specify the custom page break.

Examples:
  • Clear text between pages: "END OF PAGE"
  • Visual separator with line break
    "----my page break----
    "

-paragraph-mode

Specify the paragraph detection mode.

Default:
none
Allowed Values:
  • none — No paragraph detection.
  • bounding-box — Paragraph detection based on line bounding boxes.
  • characters — Paragraph detection based on the number of characters in the line.

-line-spacing-threshold

Set the maximum line spacing when the paragraph detection mode is enabled.

Constraint:
  • The value must be a positive integer percentage.
Default:
10%

-remove-hyphenation

Remove the hyphen character from the end of lines.

Default:
False
Allowed Values:
  • true, 1 or on
  • false, 0 or off

-remove-empty-lines

Remove empty lines from the text output.

Default:
False
Allowed Values:
  • true, 1 or on
  • false, 0 or off

-crop-area-x

Set the top left X coordinate of the crop area in points.

Constraint:
  • Must be a positive integer or 0.
Example:
  • Start extraction at 1.4 inches from left: 100

-crop-area-y

Set the top left Y coordinate of the crop area in points.

Constraint:
  • Must be a positive integer or 0.
Example:
  • Start extraction at 1.4 inches from top: 100

-crop-area-width

Set the width of the crop area in points.

Constraint:
  • Must be a positive integer or 0.
Default:
PDF page width.
Example:
  • Extract narrow 1.4 inch width: 100

-crop-area-height

Set the height of the crop area in points.

Constraint:
  • Must be a positive integer or 0.
Default:
PDF page height.
Example:
  • Extract small 1.4 inch height: 100

Miscellaneous

-debug-log

Turn on the debug logging. Details about the conversion are stored in the debug log. The URL of the log can be obtained from the getDebugLogUrl method or available in conversion statistics.

Default:
False
Allowed Values:
  • true, 1 or on
  • false, 0 or off

-tag

Tag the conversion with a custom value. The tag is used in conversion statistics. A value longer than 32 characters is cut off.

Example:
  • Track job in analytics: "client-1234"

-http-proxy

A proxy server used by the conversion process for accessing the source URLs with HTTP scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.

Constraint:
  • The value must have format DOMAIN_OR_IP_ADDRESS:PORT.
Examples:
  • Corporate proxy server: "myproxy.com:8080"
  • Direct IP proxy connection: "113.25.84.10:33333"

-https-proxy

A proxy server used by the conversion process for accessing the source URLs with HTTPS scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.

Constraint:
  • The value must have format DOMAIN_OR_IP_ADDRESS:PORT.
Examples:
  • Secure proxy for HTTPS: "myproxy.com:443"
  • Direct secure proxy IP: "113.25.84.10:44333"