PDF to Text / Ruby Reference

Constructor

`def initialize(user_name, api_key)`

Constructor for the PDFCrowd API client. Initialize a new instance of the conversion client with your PDFCrowd account credentials.

You must provide both your username and API key. This establishes the authenticated connection for all subsequent conversion operations.

Parameters:

user_name - Your username at PDFCrowd.
api_key - Your API key.

Conversion Input

`def convertUrl(url) { return byte[]; }`

Convert a PDF from a URL.

Use this as the primary method for converting web content, online documents, or any publicly accessible URL to the desired output format. Returns the conversion result as a byte array for further processing or direct use.

Parameter:

url - The address of the PDF to convert.
Constraint:
- Supported protocols are http:// and https://.

Returns:

byte[] - Byte array containing the conversion output.

`def convertUrlToStream(url, out_stream)`

Convert a PDF from a URL and write the conversion result directly to an output stream.

Use this when you need to handle large conversion results, integrate with streaming pipelines, or build server applications that process conversions continuously.

Parameters:

url - The address of the PDF to convert.
Constraint:
- Supported protocols are http:// and https://.
out_stream (OutputStream) - The output stream that will contain the conversion output.

`def convertUrlToFile(url, file_path)`

Convert a PDF from a URL and save the conversion result directly to a local file.

Use this for simple file-based workflows, batch processing, or when you need to persist conversion output to disk. The most straightforward method for URL-to-file conversions.

Parameters:

url - The address of the PDF to convert.
Constraint:
- Supported protocols are http:// and https://.
file_path - The output file path.

`def convertFile(file) { return byte[]; }`

Convert a local file to the desired output format.

Use this for processing files already on your system, converting uploaded documents, or batch processing local content. Returns the conversion result as a byte array for in-memory processing.

Parameter:

file - The path to a local file to convert.
Constraint:
- The file must exist and not be empty.

Returns:

byte[] - Byte array containing the conversion output.

`def convertFileToStream(file, out_stream)`

Convert a local file and write the conversion result directly to an output stream.

Use this when working with large conversion results, integrating with streaming frameworks, or building applications that need direct stream-to-stream processing.

Parameters:

file - The path to a local file to convert.
Constraint:
- The file must exist and not be empty.
out_stream (OutputStream) - The output stream that will contain the conversion output.

`def convertFileToFile(file, file_path)`

Convert a local file and save the conversion result to another local file.

Use this for file-based batch processing, document transformation workflows, or when both input and output are file-based. The simplest method for file-to-file conversions.

Parameters:

file - The path to a local file to convert.
Constraint:
- The file must exist and not be empty.
file_path - The output file path.

`def convertRawData(data) { return byte[]; }`

Convert raw binary data to the desired output format.

Use this for processing binary content, handling file uploads as byte arrays, or when working with data from external APIs. Provides maximum flexibility for binary data conversions.

Parameter:

data (byte[]) - The raw content to be converted.

Returns:

byte[] - Byte array with the output.

`def convertRawDataToStream(data, out_stream)`

Convert raw binary data and write the conversion result directly to an output stream.

Use this when handling large binary data with substantial conversion results, integrating with streaming systems, or building high-performance conversion services.

Parameters:

data (byte[]) - The raw content to be converted.
out_stream (OutputStream) - The output stream that will contain the conversion output.

`def convertRawDataToFile(data, file_path)`

Convert raw binary data and save the conversion result to a local file.

Use this for processing binary uploads and persisting the output, handling data from external sources, or when working with byte array inputs that need file-based storage.

Parameters:

data (byte[]) - The raw content to be converted.
file_path - The output file path.

`def convertStream(in_stream) { return byte[]; }`

Convert content from an input stream to the desired output format.

Use this when integrating with I/O pipelines, processing data from network streams or file handles, or when the source data is provided as a stream by your application.

Parameter:

in_stream (InputStream) - The input stream with source data.

Returns:

byte[] - Byte array containing the conversion output.

`def convertStreamToStream(in_stream, out_stream)`

Convert content from an input stream and write the conversion result to an output stream.

Use this when both input and output need to be streams, integrating with streaming frameworks, or building conversion services that process data in stream form throughout.

Parameters:

in_stream (InputStream) - The input stream with source data.
out_stream (OutputStream) - The output stream that will contain the conversion output.

`def convertStreamToFile(in_stream, file_path)`

Convert content from an input stream and save the conversion result to a local file.

Use this when processing streaming uploads that need to be saved, handling network data sources with file-based output, or building services that accept stream input and produce file output.

Parameters:

in_stream (InputStream) - The input stream with source data.
file_path - The output file path.

General Options

`def setPdfPassword(password)`

The password to open the encrypted PDF file.

Parameter:

password - The input PDF password.

`def setPrintPageRange(pages)`

Set the page range to print.

Parameter:

pages
Constraint:
- A comma separated list of page numbers or ranges.

Examples:

Just the second page is printed: setPrintPageRange("2")
The first and the third page are printed: setPrintPageRange("1,3")
Everything except the first page is printed: setPrintPageRange("2-")
Just the first 3 pages are printed: setPrintPageRange("-3")
Pages 3, 6, 7, 8 and 9 are printed: setPrintPageRange("3,6-9")

`def setNoLayout(value)`

Ignore the original PDF layout. Extract text in reading order without preserving column structure or positioning. Simpler output for pure text extraction.

Parameter:

value (bool) - Set to true to ignore the layout.

Default:

false

`def setEol(eol)`

The end-of-line convention for the text output.

Parameter:

eol
Allowed Values:
- unix — Unix convention "LF" is used.
- dos — DOS convention "CR LF" is used.
- mac — Mac convention "CR" is used.
Default:

unix

`def setPageBreakMode(mode)`

Specify the page break mode for the text output.

Parameter:

mode
Allowed Values:
- none — No page breaks are inserted.
- default — The standard page break code "FF" is used.
- custom — A custom page break is used.
Default:

none

`def setCustomPageBreak(page_break)`

Specify the custom page break.

Parameter:

page_break - String to insert between the pages.

Examples:

Clear text between pages: setCustomPageBreak("END OF PAGE")

Visual separator with line break

setCustomPageBreak("----my page break----
")

`def setParagraphMode(mode)`

Specify the paragraph detection mode. Enable to format output with proper paragraph breaks. Use "none" for raw text, or detection modes for formatted output.

Parameter:

mode
Allowed Values:
- none — No paragraph detection.
- bounding-box — Paragraph detection based on line bounding boxes.
- characters — Paragraph detection based on the number of characters in the line.
Default:

none

`def setLineSpacingThreshold(threshold)`

Set the maximum line spacing when the paragraph detection mode is enabled.

Parameter:

threshold
Constraint:
- The value must be a positive integer percentage.
Default:

10%

`def setRemoveHyphenation(value)`

Remove the hyphen character from the end of lines.

Parameter:

value (bool) - Set to true to remove hyphens.

Default:

false

`def setRemoveEmptyLines(value)`

Remove empty lines from the text output.

Parameter:

value (bool) - Set to true to remove empty lines.

Default:

false

`def setCropAreaX(x)`

Set the top left X coordinate of the crop area in points.

Parameter:

x (int)
Constraint:
- Must be a positive integer or 0.

Example:

Start extraction at 1.4 inches from left: setCropAreaX(100)

`def setCropAreaY(y)`

Set the top left Y coordinate of the crop area in points.

Parameter:

y (int)
Constraint:
- Must be a positive integer or 0.

Example:

Start extraction at 1.4 inches from top: setCropAreaY(100)

`def setCropAreaWidth(width)`

Set the width of the crop area in points.

Parameter:

width (int)
Constraint:
- Must be a positive integer or 0.
Default:

PDF page width.

Example:

Extract narrow 1.4 inch width: setCropAreaWidth(100)

`def setCropAreaHeight(height)`

Set the height of the crop area in points.

Parameter:

height (int)
Constraint:
- Must be a positive integer or 0.
Default:

PDF page height.

Example:

Extract small 1.4 inch height: setCropAreaHeight(100)

`def setCropArea(x, y, width, height)`

Set the crop area. It allows you to extract just a part of a PDF page.

Parameters:

x (int) - Set the top left X coordinate of the crop area in points.
Constraint:
- Must be a positive integer or 0.
y (int) - Set the top left Y coordinate of the crop area in points.
Constraint:
- Must be a positive integer or 0.
width (int) - Set the width of the crop area in points.
Constraint:
- Must be a positive integer or 0.
Default:

PDF page width.
height (int) - Set the height of the crop area in points.
Constraint:
- Must be a positive integer or 0.
Default:

PDF page height.

Miscellaneous

`def setDebugLog(value)`

Turn on debug logging to troubleshoot conversion issues. Details about the conversion process, including resource loading, rendering steps, and error messages are stored in the debug log. Use this when conversions fail or produce unexpected results. The URL of the log can be obtained from the getDebugLogUrl method or available in conversion statistics.

Parameter:

value (bool) - Set to true to enable debug logging.

Default:

false

`def getDebugLogUrl() { return string; }`

Get the URL of the debug log for the last conversion.

Returns:: string - The link to the debug log.

`def getRemainingCreditCount() { return int; }`

Get the number of conversion credits available in your account. Use this to monitor your credit usage and implement alerts before running out of credits.
This method can only be called after a call to one of the convertXtoY methods.
The returned value can differ from the actual count if you run parallel conversions.
The special value 999999 is returned if the information is not available.

Returns:: int - The number of credits.

`def getConsumedCreditCount() { return int; }`

Get the number of credits consumed by the last conversion. Use this to track costs per conversion, especially for complex documents or operations that may consume multiple credits.

Returns:: int - The number of credits.

`def getJobId() { return string; }`

Get the unique job ID for the conversion. Use this to track conversions in your logs, correlate with debug logs, or reference specific conversions when contacting support.

Returns:: string - The unique job identifier.

`def getPageCount() { return int; }`

Get the number of pages in the output document. Use this to validate conversion results, calculate pagination for user interfaces, or track document complexity metrics.

Returns:: int - The page count.

`def getOutputSize() { return int; }`

Get the size of the output document in bytes. Use this to check file sizes before delivery, implement size-based quotas, or optimize storage allocation.

Returns:: int - The count of bytes.

`def getVersion() { return string; }`

Get the version details including API version, converter version, and client library version. Use this for debugging, logging, or ensuring compatibility when reporting issues.

Returns:: string - API version, converter version, and client version.

`def setTag(tag)`

Tag the conversion with a custom value for tracking and analytics. Use this to categorize conversions by customer ID, document type, or business unit. The tag appears in conversion statistics. A value longer than 32 characters is cut off.

Parameter:

tag - A string with the custom tag.

Example:

Track job in analytics: setTag("client-1234")

`def setHttpProxy(proxy)`

A proxy server used by the conversion process for accessing the source URLs with HTTP scheme. This can help circumvent regional restrictions or provide limited access to your intranet.

Parameter:

proxy
Constraint:
- The value must have format DOMAIN_OR_IP_ADDRESS:PORT.

Examples:

Corporate proxy server: setHttpProxy("myproxy.com:8080")
Direct IP proxy connection: setHttpProxy("113.25.84.10:33333")

`def setHttpsProxy(proxy)`

A proxy server used by the conversion process for accessing the source URLs with HTTPS scheme. This can help circumvent regional restrictions or provide limited access to your intranet.

Parameter:

proxy
Constraint:
- The value must have format DOMAIN_OR_IP_ADDRESS:PORT.

Examples:

Secure proxy for HTTPS: setHttpsProxy("myproxy.com:443")
Direct secure proxy IP: setHttpsProxy("113.25.84.10:44333")

API Client Options

`def setUseHttp(value)`

Specify whether to use HTTP or HTTPS when connecting to the PDFCrowd API.

Parameter:

value (bool) - Set to true to use HTTP.

Default:

false

`def setClientUserAgent(agent)`

Specify the User-Agent HTTP header that the client library will use when interacting with the API.

Availability:

API client >= 6.4.0 See versioning.

Parameter:

agent - The user agent string.

`def setUserAgent(agent)`

Deprecated Replaced with: setClientUserAgent

Set a custom user agent HTTP header. It can be useful if you are behind a proxy or a firewall.

Parameter:

agent - The user agent string.

Default:

pdfcrowd_ruby_client/6.5.4 (https://pdfcrowd.com)

`def setProxy(host, port, user_name, password)`

Specify an HTTP proxy that the API client library will use to connect to the internet.

Parameters:

host - The proxy hostname.
port (int) - The proxy port.
user_name - The username.
password - The password.

`def setRetryCount(count)`

Specify the number of automatic retries when a 502 or 503 HTTP status code is received. The status code indicates a temporary network issue. This feature can be disabled by setting to 0.

Parameter:

count (int) - Number of retries.

Default:

1

Example:

Retry failed requests three times: setRetryCount(3)

PDF to Text / Ruby Reference

class PdfToTextClient

Constructor

def initialize(user_name, api_key)

Conversion Input

def convertUrl(url) { return byte[]; }

def convertUrlToStream(url, out_stream)

def convertUrlToFile(url, file_path)

def convertFile(file) { return byte[]; }

def convertFileToStream(file, out_stream)

def convertFileToFile(file, file_path)

def convertRawData(data) { return byte[]; }

def convertRawDataToStream(data, out_stream)

def convertRawDataToFile(data, file_path)

def convertStream(in_stream) { return byte[]; }

def convertStreamToStream(in_stream, out_stream)

def convertStreamToFile(in_stream, file_path)

General Options

def setPdfPassword(password)

def setPrintPageRange(pages)

def setNoLayout(value)

def setEol(eol)

def setPageBreakMode(mode)

def setCustomPageBreak(page_break)

def setParagraphMode(mode)

def setLineSpacingThreshold(threshold)

def setRemoveHyphenation(value)

def setRemoveEmptyLines(value)

def setCropAreaX(x)

def setCropAreaY(y)

def setCropAreaWidth(width)

def setCropAreaHeight(height)

def setCropArea(x, y, width, height)

Miscellaneous

def setDebugLog(value)

def getDebugLogUrl() { return string; }

def getRemainingCreditCount() { return int; }

def getConsumedCreditCount() { return int; }

def getJobId() { return string; }

def getPageCount() { return int; }

def getOutputSize() { return int; }

def getVersion() { return string; }

def setTag(tag)

def setHttpProxy(proxy)

def setHttpsProxy(proxy)

API Client Options

def setUseHttp(value)

def setClientUserAgent(agent)

def setUserAgent(agent)

def setProxy(host, port, user_name, password)

def setRetryCount(count)

`def initialize(user_name, api_key)`

`def convertUrl(url) { return byte[]; }`

`def convertUrlToStream(url, out_stream)`

`def convertUrlToFile(url, file_path)`

`def convertFile(file) { return byte[]; }`

`def convertFileToStream(file, out_stream)`

`def convertFileToFile(file, file_path)`

`def convertRawData(data) { return byte[]; }`

`def convertRawDataToStream(data, out_stream)`

`def convertRawDataToFile(data, file_path)`

`def convertStream(in_stream) { return byte[]; }`

`def convertStreamToStream(in_stream, out_stream)`

`def convertStreamToFile(in_stream, file_path)`

`def setPdfPassword(password)`

`def setPrintPageRange(pages)`

`def setNoLayout(value)`

`def setEol(eol)`

`def setPageBreakMode(mode)`

`def setCustomPageBreak(page_break)`

`def setParagraphMode(mode)`

`def setLineSpacingThreshold(threshold)`

`def setRemoveHyphenation(value)`

`def setRemoveEmptyLines(value)`

`def setCropAreaX(x)`

`def setCropAreaY(y)`

`def setCropAreaWidth(width)`

`def setCropAreaHeight(height)`

`def setCropArea(x, y, width, height)`

`def setDebugLog(value)`

`def getDebugLogUrl() { return string; }`

`def getRemainingCreditCount() { return int; }`

`def getConsumedCreditCount() { return int; }`

`def getJobId() { return string; }`

`def getPageCount() { return int; }`

`def getOutputSize() { return int; }`

`def getVersion() { return string; }`

`def setTag(tag)`

`def setHttpProxy(proxy)`

`def setHttpsProxy(proxy)`

`def setUseHttp(value)`

`def setClientUserAgent(agent)`

`def setUserAgent(agent)`

`def setProxy(host, port, user_name, password)`

`def setRetryCount(count)`