PDF to Text
PDF to Text (pdf-text)
Extract text content from PDF documents using auto or render-based strategies.
Transform binary json
Minimal example
actions: - pdf-text: {}JSON
{ "actions": [ { "pdf-text": {} } ]}Contents
Advanced
Advanced
| Field | Type | Required | Description |
|---|---|---|---|
emit-document-events ✓ | boolean (bool) | Emit a document-level event alongside per-page output. Default: false |
Extraction
Extraction
| Field | Type | Required | Description |
|---|---|---|---|
strategy | string | Strategy to apply: auto | |
max-pages | number (integer) | Maximum number of pages to analyze before stopping. Examples: 42, 1.2e-10 | |
page-ranges | string | Page range spec: e.g. “1-3,7”. |
General
General
| Field | Type | Required | Description |
|---|---|---|---|
description | string | Short summary displayed in the editor. | |
condition | lua-expression (string) | Conditional expression that gates whether extraction runs. Examples: 2 * count() |
Quality Filters
Quality Filters
| Field | Type | Required | Description |
|---|---|---|---|
min-text-ratio | number (string) | Minimum ratio of textual chars to consider page non-garbled. Examples: 42, 1.2e-10 | |
min-avg-word-len | number (string) | Minimum average word length to consider page non-garbled. Examples: 42, 1.2e-10 |
Rendering
Rendering
| Field | Type | Required | Description |
|---|---|---|---|
dpi | number (integer) | Rendering DPI when using render-based extraction. Examples: 42, 1.2e-10 |