Skip to content

PDF to Text

PDF to Text (pdf-text)

Extract text content from PDF documents using auto or render-based strategies.

Transform binary json

Minimal example

actions:
- pdf-text: {}
JSON
{
"actions": [
{
"pdf-text": {}
}
]
}

Contents

Advanced

Advanced
FieldTypeRequiredDescription
emit-document-eventsboolean (bool)Emit a document-level event alongside per-page output.
Default: false

Extraction

Extraction
FieldTypeRequiredDescription
strategystringStrategy to apply: auto
max-pagesnumber (integer)Maximum number of pages to analyze before stopping.
Examples: 42, 1.2e-10
page-rangesstringPage range spec: e.g. “1-3,7”.

General

General
FieldTypeRequiredDescription
descriptionstringShort summary displayed in the editor.
conditionlua-expression (string)Conditional expression that gates whether extraction runs.
Examples: 2 * count()

Quality Filters

Quality Filters
FieldTypeRequiredDescription
min-text-rationumber (string)Minimum ratio of textual chars to consider page non-garbled.
Examples: 42, 1.2e-10
min-avg-word-lennumber (string)Minimum average word length to consider page non-garbled.
Examples: 42, 1.2e-10

Rendering

Rendering
FieldTypeRequiredDescription
dpinumber (integer)Rendering DPI when using render-based extraction.
Examples: 42, 1.2e-10