Transforming Data
LyftData jobs transform events between one input and one output. This guide helps you choose the right action chain, apply AI where it adds value, and harden pipelines before production.
Choose the right transformation path
| Goal | Recommended action chain | Edition |
|---|---|---|
| Parse logs into fields | extract -> key-value -> convert | Both |
| Normalize JSON payloads | json -> flatten -> rename -> remove | Both |
| Join reference data | enrich -> time | Both |
| Prepare documents for AI workflows | pdf-text or docx-to-text -> chunk -> tokenize | Both |
| Generate structured model output | chunk -> infer -> assert | infer is Enterprise |
| Embed and cluster records | infer (embedding) -> cluster | infer is Enterprise |
Sequence actions for predictable behavior
Use this order unless you have a specific reason not to:
- Parse (
json,csv,xml,extract). - Shape fields (
rename,copy,remove,flatten,convert). - Enrich (
add,enrich,time, optionalscript). - Validate and gate (
filter,assert,abort). - Route signals (
message) and deliver via output.
This keeps validation close to the final payload shape and reduces hard-to-debug side effects.
AI-assisted transformation patterns
Structured extraction with guardrails
actions: - chunk: input-field: body output-field: chunks - infer: workload: llm-completion: llm: provider: openai-compat model: your-model input-field: chunks response-field: ai_result response-format: json prompt: schema: '{"type":"object"}' timeout-ms: 15000 on-error: dlq:ai_failures - assert: behaviour: drop-onfailure schema: schema-string: '{"type":"object"}'Embedding pipeline for downstream analytics
actions: - infer: workload: embedding: embedding: provider: openai-compat model: your-embedding-model input-field: text response-field: vector - cluster: input-field: vector output-field: cluster_idDefaults, filters, and empty events
When you write line-oriented outputs (for example CSV/text) through output.file with input-field, treat missing/null/empty values explicitly:
- Missing or null
input-field: LyftData emits nothing (no line is written). - Empty payload events (for example events dropped/filtered earlier): LyftData emits nothing.
- Empty strings (
"") are valid values: a blank line is written.
Recommended pattern:
- Use
addto set default values before output when a field must always exist. - Use
filterto drop rows you do not want (for example empty-string rows). - Use
csv-stringifyto create correctly escaped CSV rows, then pointoutput.file.input-fieldto that generated field.
Example:
actions: - add: kv-pairs: csv_row: "" - csv-stringify: fields: [name, price, category] output-field: csv_row - filter: condition: "csv_row ~= ''"output: file: path: /tmp/export.csv input-field: csv_rowProduction guardrails
- Treat
scriptas an escape hatch; prefer declarative actions first. - Add explicit conversion and validation behavior (
convert+assert/filter) so bad events fail predictably. - For
infer, always settimeout-ms,rate-limit,concurrency, andon-errorexplicitly. - Use
response-format: jsonand a schema when model output feeds downstream systems. - Keep secrets in variables such as
${dyn|OPENAI_API_KEY}, never inline in job definitions.
Run & Trace checklist
- Verify each step changes only the fields you expect.
- Check field types after
convertandtime, not just field names. - Test malformed inputs to validate failure behavior (
drop,abort, or DLQ). - For AI jobs, inspect response shape stability across multiple samples and confirm latency under realistic payload sizes.
For full field-level options, use the DSL index and open the linked action pages.