File Handling
LyftData can write data to two broad classes of destinations:
- Log files via the
fileoutput (newline-delimited events written to the local filesystem) - Block stores / object stores via outputs like
file-store,s3,gcs,azure-blob, andweb-dav-store(objects written per batch; no append)
These can look similar in the editor, but their on-disk (and on-cloud) semantics differ in important ways.
Pick the right output
| You need | Prefer | Why |
|---|---|---|
| Append events to a local file you can tail | file | Writes one line per event and keeps a single growing file. |
| Durable, partitioned exports to a “bucket” | Block store outputs (s3, gcs, azure-blob, file-store, web-dav-store) | Writes whole objects; supports batching + preprocessors; avoids “append to object” pitfalls. |
| “Object store semantics” on disk | file-store | Mirrors cloud object stores while writing to a local directory. |
Where data is written
All outputs run on the worker executing the job:
- If you run on the built-in worker, paths are on the server host/container.
- If you run on an external worker, paths are on that worker host/container.
Use absolute paths and ensure the target directory is persistent (volume-mounted in containers) and writable.
Log Files (file): append vs overwrite (and flushing)
The Log Files output writes one line per event (NDJSON-style when writing full JSON events).
- Default behavior is append:
file-per-event: falseappends to an existing file (and creates it if missing). file-per-event: truewrites each event by opening the path for a fresh write (so a constantpathwill be overwritten repeatedly). Use this only with a uniquepath(for example, by including${}expansions in the filename).compress-after: truegzips the previous file when the expanded path changes (and removes the uncompressed file).truncate: truetruncates a file the first time it is seen during a run (useful when re-running a job into the same path).
Flushing: as of LyftData 2.0.2, the file output flushes each event write (the flush-at-end field exists in the schema but is not yet honored). For high-throughput exports, prefer a block store output with batching instead of relying on file buffering.
Block store outputs: objects, GUIDs, and “no append”
Block store outputs write objects, not log files:
-
There is no append operation for an existing object. Each put writes a complete object body.
-
By default, the outputs append a GUID to avoid collisions. The default
guid-prefixis/, soobject-name.name: processed/summary.jsonbecomes keys likeprocessed/summary.json/<uuid>. -
If you want filename-style keys, keep the GUID enabled but set
guid-prefix/guid-suffix. For example:object-name: { name: processed/${partition||unknown}/summary }guid-prefix: -guid-suffix: .json.gz
…yields keys like
processed/2026-03-19/summary-<uuid>.json.gz. -
Set
disable-*-name-guid: trueonly when you intend deterministic overwrites or your name is already unique. -
Preprocessors like
gzipcompress bytes but do not rename objects; include.gzin the key yourself.
For large batched uploads, prefer streaming multipart writes when supported:
runtime-options: prefer-streaming-outputs: trueThis reduces peak memory by streaming batched uploads instead of buffering the entire combined batch in memory.
Naming and partitioning with ${} expansions
Many output fields accept ${} expansions evaluated per event (including file paths and object names). Use || to provide defaults:
${partition||unknown}${host||unassigned}
It is usually clearer to create a single partition field in your actions (and sanitize it), then reference it from outputs.
When batching is enabled on a block store output, expansions are resolved once per batch (using the last event in the batch). Ensure each batch contains only one logical partition (or use ${stat|batch_number} instead of per-event fields).
Example: build a partition prefix and use it in an object-store key:
actions: - slugify: input-field: host output-field: host_slug - add: output-fields: partition: host/${host_slug}
output: s3: bucket-name: analytics-prod-archive object-name: name: exports/${partition||unknown}/events guid-prefix: "-" guid-suffix: ".ndjson"If you need a simple per-batch uniqueness knob, outputs can also expand ${stat|batch_number} when batching is enabled (for example: events-${stat|batch_number}.ndjson).