PDF to Text Command Line Reference

Conversion from PDF to text.

usage: pdf2text [options] source
source
Source to be converted. It can be URL, path to a local file or '-' to use stdin as an input text.

Options

General Options

-pdf-password
The password to open the encrypted PDF file.
Set the page range to print.
Constraint:
  • A comma separated list of page numbers or ranges.
Examples:
  • Just the second page is printed.
    "2"
  • The first and the third page are printed.
    "1,3"
  • Everything except the first page is printed.
    "2-"
  • Just first 3 pages are printed.
    "-3"
  • Pages 3, 6, 7, 8 and 9 are printed.
    "3,6-9"
-no-layout
Ignore the original PDF layout.
Allowed values:
  • true, 1 or on
  • false, 0 or off
Default: False
-eol
The end-of-line convention for the text output.
Allowed values:
  • unix
    Unix convension "LF" is used.
  • dos
    Dos convension "CR LF" is used.
  • mac
    Mac convension "CR" is used.
Default: unix
-page-break-mode
Specify the page break mode for the text output.
Allowed values:
  • none
    No page breaks are inserted.
  • default
    The standard page break code "FF" is used.
  • custom
    A custom page break is used.
Default: none
-custom-page-break
Specify the custom page break.
Examples:
  • "END OF PAGE"
  • "----my page break---- "
-paragraph-mode
Specify the paragraph detection mode.
Allowed values:
  • none
    No paragraph detection.
  • bounding-box
    Paragraph detection based on line bounding boxes.
  • characters
    Paragraph detection based on the number of characters in the line.
Default: none
-line-spacing-threshold
Set the maximum line spacing when the paragraph detection mode is enabled.
Constraint:
  • The value must be a positive integer percentage.
Default: 10%
-remove-hyphenation
Remove the hyphen character from the end of lines.
Allowed values:
  • true, 1 or on
  • false, 0 or off
Default: False
-remove-empty-lines
Remove empty lines from the text output.
Allowed values:
  • true, 1 or on
  • false, 0 or off
Default: False
-crop-area-x
Set the top left X coordinate of the crop area in points.
Constraint:
  • Must be a positive integer number or 0.
Default: 0
Example:
  • 100
-crop-area-y
Set the top left Y coordinate of the crop area in points.
Constraint:
  • Must be a positive integer number or 0.
Default: 0
Example:
  • 100
-crop-area-width
Set the width of the crop area in points.
Constraint:
  • Must be a positive integer number or 0.
Default: PDF page width.
Example:
  • 100
-crop-area-height
Set the height of the crop area in points.
Constraint:
  • Must be a positive integer number or 0.
Default: PDF page height.
Example:
  • 100

Miscellaneous

-debug-log
Turn on the debug logging. Details about the conversion are stored in the debug log. The URL of the log can be obtained from the getDebugLogUrl method or available in conversion statistics.
Allowed values:
  • true, 1 or on
  • false, 0 or off
Default: False
-tag
Tag the conversion with a custom value. The tag is used in conversion statistics. A value longer than 32 characters is cut off.
Example:
  • "client-1234"
-http-proxy
A proxy server used by Pdfcrowd conversion process for accessing the source URLs with HTTP scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.
Constraint:
  • The value must have format DOMAIN_OR_IP_ADDRESS:PORT.
Examples:
  • "myproxy.com:8080"
  • "113.25.84.10:33333"
-https-proxy
A proxy server used by Pdfcrowd conversion process for accessing the source URLs with HTTPS scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.
Constraint:
  • The value must have format DOMAIN_OR_IP_ADDRESS:PORT.
Examples:
  • "myproxy.com:443"
  • "113.25.84.10:44333"