PDF to Text

PDF to Text (`pdf-text`)

Extract text content from PDF documents using auto or render-based strategies.

Transform binary json

actions:
  - pdf-text: {}

JSON

{
  "actions": [
    {
      "pdf-text": {}
    }
  ]
}

Field	Type	Required	Description
`emit-document-events` ✓	`boolean` (`bool`)		Emit a document-level event alongside per-page output. Default: `false`

Field	Type	Description
`strategy`	`string`	Strategy to apply: auto
`max-pages`	`number` (`integer`)	Maximum number of pages to analyze before stopping. Examples: `42`, `1.2e-10`
`page-ranges`	`string`	Page range spec: e.g. “1-3,7”.

Field	Type	Required	Description
`description`	`string`		Short summary displayed in the editor.
`condition`	`lua-expression` (`string`)		Conditional expression that gates whether extraction runs. Examples: `2 * count()`

Field	Type	Required	Description
`min-text-ratio`	`number` (`string`)		Minimum ratio of textual chars to consider page non-garbled. Examples: `42`, `1.2e-10`
`min-avg-word-len`	`number` (`string`)		Minimum average word length to consider page non-garbled. Examples: `42`, `1.2e-10`

Field	Type	Required	Description
`dpi`	`number` (`integer`)		Rendering DPI when using render-based extraction. Examples: `42`, `1.2e-10`