PDF to Text / Golang Reference

class PdfToTextClient

All setter methods return PdfToTextClient object unless specified otherwise.

Constructor

func NewPdfToTextClient(userName string, apiKey string) PdfToTextClient

Constructor for the PDFCrowd API client.

Parameters:
  • userName - Your username at PDFCrowd.
  • apiKey - Your API key.

Conversion Input

func (client *PdfToTextClient) ConvertUrl(url string) ([]byte, error)

Convert a PDF.

Parameter:
  • url - The address of the PDF to convert.
    Constraint:
    • Supported protocols are http:// and https://.
Returns:
[]byte - Byte array containing the conversion output.

func (client *PdfToTextClient) ConvertUrlToStream(url string, outStream io.Writer) error

Convert a PDF and write the result to an output stream.

Parameters:
  • url - The address of the PDF to convert.
    Constraint:
    • Supported protocols are http:// and https://.
  • outStream (OutputStream) - The output stream that will contain the conversion output.

func (client *PdfToTextClient) ConvertUrlToFile(url string, filePath string) error

Convert a PDF and write the result to a local file.

Parameters:
  • url - The address of the PDF to convert.
    Constraint:
    • Supported protocols are http:// and https://.
  • filePath - The output file path.

func (client *PdfToTextClient) ConvertFile(file string) ([]byte, error)

Convert a local file.

Parameter:
  • file - The path to a local file to convert.
    Constraint:
    • The file must exist and not be empty.
Returns:
[]byte - Byte array containing the conversion output.

func (client *PdfToTextClient) ConvertFileToStream(file string, outStream io.Writer) error

Convert a local file and write the result to an output stream.

Parameters:
  • file - The path to a local file to convert.
    Constraint:
    • The file must exist and not be empty.
  • outStream (OutputStream) - The output stream that will contain the conversion output.

func (client *PdfToTextClient) ConvertFileToFile(file string, filePath string) error

Convert a local file and write the result to a local file.

Parameters:
  • file - The path to a local file to convert.
    Constraint:
    • The file must exist and not be empty.
  • filePath - The output file path.

func (client *PdfToTextClient) ConvertRawData(data []byte) ([]byte, error)

Convert raw data.

Parameter:
  • data (byte[]) - The raw content to be converted.
Returns:
[]byte - Byte array with the output.

func (client *PdfToTextClient) ConvertRawDataToStream(data []byte, outStream io.Writer) error

Convert raw data and write the result to an output stream.

Parameters:
  • data (byte[]) - The raw content to be converted.
  • outStream (OutputStream) - The output stream that will contain the conversion output.

func (client *PdfToTextClient) ConvertRawDataToFile(data []byte, filePath string) error

Convert raw data to a file.

Parameters:
  • data (byte[]) - The raw content to be converted.
  • filePath - The output file path.

func (client *PdfToTextClient) ConvertStream(inStream io.Reader) ([]byte, error)

Convert the contents of an input stream.

Parameter:
  • inStream (InputStream) - The input stream with source data.
Returns:
[]byte - Byte array containing the conversion output.

func (client *PdfToTextClient) ConvertStreamToStream(inStream io.Reader, outStream io.Writer) error

Convert the contents of an input stream and write the result to an output stream.

Parameters:
  • inStream (InputStream) - The input stream with source data.
  • outStream (OutputStream) - The output stream that will contain the conversion output.

func (client *PdfToTextClient) ConvertStreamToFile(inStream io.Reader, filePath string) error

Convert the contents of an input stream and write the result to a local file.

Parameters:
  • inStream (InputStream) - The input stream with source data.
  • filePath - The output file path.

General Options

func (client *PdfToTextClient) SetPdfPassword(password string) *PdfToTextClient

The password to open the encrypted PDF file.

Parameter:
  • password - The input PDF password.

func (client *PdfToTextClient) SetPrintPageRange(pages string) *PdfToTextClient

Set the page range to print.

Parameter:
  • pages
    Constraint:
    • A comma separated list of page numbers or ranges.
Examples:
  • Just the second page is printed: SetPrintPageRange("2")
  • The first and the third page are printed: SetPrintPageRange("1,3")
  • Everything except the first page is printed: SetPrintPageRange("2-")
  • Just first 3 pages are printed: SetPrintPageRange("-3")
  • Pages 3, 6, 7, 8 and 9 are printed: SetPrintPageRange("3,6-9")

func (client *PdfToTextClient) SetNoLayout(value bool) *PdfToTextClient

Ignore the original PDF layout.

Parameter:
  • value (bool) - Set to true to ignore the layout.
    Default:
    false

func (client *PdfToTextClient) SetEol(eol string) *PdfToTextClient

The end-of-line convention for the text output.

Parameter:
  • eol
    Allowed Values:
    • unix — Unix convension "LF" is used.
    • dos — Dos convension "CR LF" is used.
    • mac — Mac convension "CR" is used.
    Default:
    unix

func (client *PdfToTextClient) SetPageBreakMode(mode string) *PdfToTextClient

Specify the page break mode for the text output.

Parameter:
  • mode
    Allowed Values:
    • none — No page breaks are inserted.
    • default — The standard page break code "FF" is used.
    • custom — A custom page break is used.
    Default:
    none

func (client *PdfToTextClient) SetCustomPageBreak(pageBreak string) *PdfToTextClient

Specify the custom page break.

Parameter:
  • pageBreak - String to insert between the pages.
Examples:
  • Clear text between pages: SetCustomPageBreak("END OF PAGE")
  • Visual separator with line break
    SetCustomPageBreak("----my page break----
    ")

func (client *PdfToTextClient) SetParagraphMode(mode string) *PdfToTextClient

Specify the paragraph detection mode.

Parameter:
  • mode
    Allowed Values:
    • none — No paragraph detection.
    • bounding-box — Paragraph detection based on line bounding boxes.
    • characters — Paragraph detection based on the number of characters in the line.
    Default:
    none

func (client *PdfToTextClient) SetLineSpacingThreshold(threshold string) *PdfToTextClient

Set the maximum line spacing when the paragraph detection mode is enabled.

Parameter:
  • threshold
    Constraint:
    • The value must be a positive integer percentage.
    Default:
    10%

func (client *PdfToTextClient) SetRemoveHyphenation(value bool) *PdfToTextClient

Remove the hyphen character from the end of lines.

Parameter:
  • value (bool) - Set to true to remove hyphens.
    Default:
    false

func (client *PdfToTextClient) SetRemoveEmptyLines(value bool) *PdfToTextClient

Remove empty lines from the text output.

Parameter:
  • value (bool) - Set to true to remove empty lines.
    Default:
    false

func (client *PdfToTextClient) SetCropAreaX(x int) *PdfToTextClient

Set the top left X coordinate of the crop area in points.

Parameter:
  • x (int)
    Constraint:
    • Must be a positive integer or 0.
Example:
  • Start extraction at 1.4 inches from left: SetCropAreaX(100)

func (client *PdfToTextClient) SetCropAreaY(y int) *PdfToTextClient

Set the top left Y coordinate of the crop area in points.

Parameter:
  • y (int)
    Constraint:
    • Must be a positive integer or 0.
Example:
  • Start extraction at 1.4 inches from top: SetCropAreaY(100)

func (client *PdfToTextClient) SetCropAreaWidth(width int) *PdfToTextClient

Set the width of the crop area in points.

Parameter:
  • width (int)
    Constraint:
    • Must be a positive integer or 0.
    Default:
    PDF page width.
Example:
  • Extract narrow 1.4 inch width: SetCropAreaWidth(100)

func (client *PdfToTextClient) SetCropAreaHeight(height int) *PdfToTextClient

Set the height of the crop area in points.

Parameter:
  • height (int)
    Constraint:
    • Must be a positive integer or 0.
    Default:
    PDF page height.
Example:
  • Extract small 1.4 inch height: SetCropAreaHeight(100)

func (client *PdfToTextClient) SetCropArea(x int, y int, width int, height int) *PdfToTextClient

Set the crop area. It allows to extract just a part of a PDF page.

Parameters:
  • x (int) - Set the top left X coordinate of the crop area in points.
    Constraint:
    • Must be a positive integer or 0.
  • y (int) - Set the top left Y coordinate of the crop area in points.
    Constraint:
    • Must be a positive integer or 0.
  • width (int) - Set the width of the crop area in points.
    Constraint:
    • Must be a positive integer or 0.
    Default:
    PDF page width.
  • height (int) - Set the height of the crop area in points.
    Constraint:
    • Must be a positive integer or 0.
    Default:
    PDF page height.

Miscellaneous

func (client *PdfToTextClient) SetDebugLog(value bool) *PdfToTextClient

Turn on the debug logging. Details about the conversion are stored in the debug log. The URL of the log can be obtained from the getDebugLogUrl method or available in conversion statistics.

Parameter:
  • value (bool) - Set to true to enable the debug logging.
    Default:
    false

func (client *PdfToTextClient) GetDebugLogUrl() string

Get the URL of the debug log for the last conversion.

Returns:
string - The link to the debug log.

func (client *PdfToTextClient) GetRemainingCreditCount() int

Get the number of conversion credits available in your account.
This method can only be called after a call to one of the convertXtoY methods.
The returned value can differ from the actual count if you run parallel conversions.
The special value 999999 is returned if the information is not available.

Returns:
int - The number of credits.

func (client *PdfToTextClient) GetConsumedCreditCount() int

Get the number of credits consumed by the last conversion.

Returns:
int - The number of credits.

func (client *PdfToTextClient) GetJobId() string

Get the job id.

Returns:
string - The unique job identifier.

func (client *PdfToTextClient) GetPageCount() int

Get the number of pages in the output document.

Returns:
int - The page count.

func (client *PdfToTextClient) GetOutputSize() int

Get the size of the output in bytes.

Returns:
int - The count of bytes.

func (client *PdfToTextClient) GetVersion() string

Get the version details.

Returns:
string - API version, converter version, and client version.

func (client *PdfToTextClient) SetTag(tag string) *PdfToTextClient

Tag the conversion with a custom value. The tag is used in conversion statistics. A value longer than 32 characters is cut off.

Parameter:
  • tag - A string with the custom tag.
Example:
  • Track job in analytics: SetTag("client-1234")

func (client *PdfToTextClient) SetHttpProxy(proxy string) *PdfToTextClient

A proxy server used by the conversion process for accessing the source URLs with HTTP scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.

Parameter:
  • proxy
    Constraint:
    • The value must have format DOMAIN_OR_IP_ADDRESS:PORT.
Examples:
  • Corporate proxy server: SetHttpProxy("myproxy.com:8080")
  • Direct IP proxy connection: SetHttpProxy("113.25.84.10:33333")

func (client *PdfToTextClient) SetHttpsProxy(proxy string) *PdfToTextClient

A proxy server used by the conversion process for accessing the source URLs with HTTPS scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.

Parameter:
  • proxy
    Constraint:
    • The value must have format DOMAIN_OR_IP_ADDRESS:PORT.
Examples:
  • Secure proxy for HTTPS: SetHttpsProxy("myproxy.com:443")
  • Direct secure proxy IP: SetHttpsProxy("113.25.84.10:44333")

API Client Options

func (client *PdfToTextClient) SetUseHttp(value bool) *PdfToTextClient

Specify whether to use HTTP or HTTPS when connecting to the PDFCrowd API.

Parameter:
  • value (bool) - Set to true to use HTTP.
    Default:
    false

func (client *PdfToTextClient) SetClientUserAgent(agent string) *PdfToTextClient

Specifies the User-Agent HTTP header that the client library will use when interacting with the API.

Availability:
API client >= 6.4.0 See versioning.
Parameter:
  • agent - The user agent string.

func (client *PdfToTextClient) SetUserAgent(agent string) *PdfToTextClient

Deprecated Replaced with: SetClientUserAgent

Set a custom user agent HTTP header. It can be useful if you are behind a proxy or a firewall.

Parameter:
  • agent - The user agent string.
    Default:
    pdfcrowd_go_client/6.5.2 (https://pdfcrowd.com)

func (client *PdfToTextClient) SetProxy(host string, port int, userName string, password string) *PdfToTextClient

Specifies an HTTP proxy that the API client library will use to connect to the internet.

Parameters:
  • host - The proxy hostname.
  • port (int) - The proxy port.
  • userName - The username.
  • password - The password.

func (client *PdfToTextClient) SetRetryCount(count int) *PdfToTextClient

Specifies the number of automatic retries when the 502 or 503 HTTP status code is received. The status code indicates a temporary network issue. This feature can be disabled by setting to 0.

Parameter:
  • count (int) - Number of retries.
    Default:
    1
Example:
  • Retry failed requests three times: SetRetryCount(3)