PDF to Text PHP Reference

class PdfToTextClient

All setter methods return PdfToTextClient object unless specified otherwise.

Constructor

function __construct($user_name, $api_key)
Constructor for the Pdfcrowd API client.
user_name
Your username at Pdfcrowd.
api_key
Your API key.

Conversion Input

function convertUrl($url)
Convert a PDF.
url
The address of the PDF to convert.
The supported protocols are http:// and https://.
Returns
  • byte[] - Byte array containing the conversion output.
function convertUrlToStream($url, $out_stream)
Convert a PDF and write the result to an output stream.
url
The address of the PDF to convert.
The supported protocols are http:// and https://.
out_stream
The output stream that will contain the conversion output.
function convertUrlToFile($url, $file_path)
Convert a PDF and write the result to a local file.
url
The address of the PDF to convert.
The supported protocols are http:// and https://.
file_path
The output file path.
function convertFile($file)
Convert a local file.
file
The path to a local file to convert.
The file must exist and not be empty.
Returns
  • byte[] - Byte array containing the conversion output.
function convertFileToStream($file, $out_stream)
Convert a local file and write the result to an output stream.
file
The path to a local file to convert.
The file must exist and not be empty.
out_stream
The output stream that will contain the conversion output.
function convertFileToFile($file, $file_path)
Convert a local file and write the result to a local file.
file
The path to a local file to convert.
The file must exist and not be empty.
file_path
The output file path.
function convertRawData($data)
Convert raw data.
data
The raw content to be converted.
Returns
  • byte[] - Byte array with the output.
function convertRawDataToStream($data, $out_stream)
Convert raw data and write the result to an output stream.
data
The raw content to be converted.
out_stream
The output stream that will contain the conversion output.
function convertRawDataToFile($data, $file_path)
Convert raw data to a file.
data
The raw content to be converted.
file_path
The output file path.
function convertStream($in_stream)
Convert the contents of an input stream.
in_stream
The input stream with source data.
Returns
  • byte[] - Byte array containing the conversion output.
function convertStreamToStream($in_stream, $out_stream)
Convert the contents of an input stream and write the result to an output stream.
in_stream
The input stream with source data.
out_stream
The output stream that will contain the conversion output.
function convertStreamToFile($in_stream, $file_path)
Convert the contents of an input stream and write the result to a local file.
in_stream
The input stream with source data.
file_path
The output file path.

General Options

function setPdfPassword($password)
The password to open the encrypted PDF file.
password
The input PDF password.
function setPrintPageRange($pages)
Set the page range to print.
pages
A comma separated list of page numbers or ranges.
Examples:
  • Just the second page is printed.
    setPrintPageRange("2")
  • The first and the third page are printed.
    setPrintPageRange("1,3")
  • Everything except the first page is printed.
    setPrintPageRange("2-")
  • Just first 3 pages are printed.
    setPrintPageRange("-3")
  • Pages 3, 6, 7, 8 and 9 are printed.
    setPrintPageRange("3,6-9")
function setNoLayout($value)
Ignore the original PDF layout.
value
Set to true to ignore the layout.
Default: false
function setEol($eol)
The end-of-line convention for the text output.
eol
Allowed values:
  • unix
    Unix convension "LF" is used.
  • dos
    Dos convension "CR LF" is used.
  • mac
    Mac convension "CR" is used.
Default: unix
function setPageBreakMode($mode)
Specify the page break mode for the text output.
mode
Allowed values:
  • none
    No page breaks are inserted.
  • default
    The standard page break code "FF" is used.
  • custom
    A custom page break is used.
Default: none
function setCustomPageBreak($page_break)
Specify the custom page break.
page_break
String to insert between the pages.
Examples:
  • setCustomPageBreak("END OF PAGE")
  • setCustomPageBreak("----my page break---- ")
function setParagraphMode($mode)
Specify the paragraph detection mode.
mode
Allowed values:
  • none
    No paragraph detection.
  • bounding-box
    Paragraph detection based on line bounding boxes.
  • characters
    Paragraph detection based on the number of characters in the line.
Default: none
function setLineSpacingThreshold($threshold)
Set the maximum line spacing when the paragraph detection mode is enabled.
threshold
The value must be a positive integer percentage.
Default: 10%
function setRemoveHyphenation($value)
Remove the hyphen character from the end of lines.
value
Set to true to remove hyphens.
Default: false
function setRemoveEmptyLines($value)
Remove empty lines from the text output.
value
Set to true to remove empty lines.
Default: false
function setCropAreaX($x)
Set the top left X coordinate of the crop area in points.
x
Must be a positive integer number or 0.
Default: 0
Example:
  • setCropAreaX(100)
function setCropAreaY($y)
Set the top left Y coordinate of the crop area in points.
y
Must be a positive integer number or 0.
Default: 0
Example:
  • setCropAreaY(100)
function setCropAreaWidth($width)
Set the width of the crop area in points.
width
Must be a positive integer number or 0.
Default: PDF page width.
Example:
  • setCropAreaWidth(100)
function setCropAreaHeight($height)
Set the height of the crop area in points.
height
Must be a positive integer number or 0.
Default: PDF page height.
Example:
  • setCropAreaHeight(100)
function setCropArea($x, $y, $width, $height)
Set the crop area. It allows to extract just a part of a PDF page.
x
Set the top left X coordinate of the crop area in points.
Must be a positive integer number or 0.
Default: 0
y
Set the top left Y coordinate of the crop area in points.
Must be a positive integer number or 0.
Default: 0
width
Set the width of the crop area in points.
Must be a positive integer number or 0.
Default: PDF page width.
height
Set the height of the crop area in points.
Must be a positive integer number or 0.
Default: PDF page height.

Miscellaneous

function setDebugLog($value)
Turn on the debug logging. Details about the conversion are stored in the debug log. The URL of the log can be obtained from the getDebugLogUrl method or available in conversion statistics.
value
Set to true to enable the debug logging.
Default: false
function getDebugLogUrl()
Get the URL of the debug log for the last conversion.
Returns
  • string - The link to the debug log.
function getRemainingCreditCount()
Get the number of conversion credits available in your account.
This method can only be called after a call to one of the convertXtoY methods.
The returned value can differ from the actual count if you run parallel conversions.
The special value 999999 is returned if the information is not available.
Returns
  • int - The number of credits.
function getConsumedCreditCount()
Get the number of credits consumed by the last conversion.
Returns
  • int - The number of credits.
function getJobId()
Get the job id.
Returns
  • string - The unique job identifier.
function getPageCount()
Get the number of pages in the output document.
Returns
  • int - The page count.
function getOutputSize()
Get the size of the output in bytes.
Returns
  • int - The count of bytes.
function getVersion()
Get the version details.
Returns
  • string - API version, converter version, and client version.
function setTag($tag)
Tag the conversion with a custom value. The tag is used in conversion statistics. A value longer than 32 characters is cut off.
tag
A string with the custom tag.
Example:
  • setTag("client-1234")
function setHttpProxy($proxy)
A proxy server used by Pdfcrowd conversion process for accessing the source URLs with HTTP scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.
proxy
The value must have format DOMAIN_OR_IP_ADDRESS:PORT.
Examples:
  • setHttpProxy("myproxy.com:8080")
  • setHttpProxy("113.25.84.10:33333")
function setHttpsProxy($proxy)
A proxy server used by Pdfcrowd conversion process for accessing the source URLs with HTTPS scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.
proxy
The value must have format DOMAIN_OR_IP_ADDRESS:PORT.
Examples:
  • setHttpsProxy("myproxy.com:443")
  • setHttpsProxy("113.25.84.10:44333")

API Client Options

function setUseHttp($value)
Specifies if the client communicates over HTTP or HTTPS with Pdfcrowd API.
value
Set to true to use HTTP.
Default: false

Warning

Using HTTP is insecure as data sent over HTTP is not encrypted. Enable this option only if you know what you are doing.

function setUserAgent($agent)
Set a custom user agent HTTP header. It can be useful if you are behind a proxy or a firewall.
agent
The user agent string.
Default: pdfcrowd_php_client/5.12.1 (https://pdfcrowd.com)
function setProxy($host, $port, $user_name, $password)
Specifies an HTTP proxy that the API client library will use to connect to the internet.
host
The proxy hostname.
port
The proxy port.
user_name
The username.
password
The password.
function setUseCurl($value)
Use cURL for the conversion request instead of the file_get_contents() PHP function.
value
Set to true to use PHP's cURL.
Default: false
function setRetryCount($count)
Specifies the number of automatic retries when the 502 or 503 HTTP status code is received. The status code indicates a temporary network issue. This feature can be disabled by setting to 0.
count
Number of retries.
Default: 1
Example:
  • setRetryCount(3)