File Handling

LyftData can write data to two broad classes of destinations:

Log files via the file output (newline-delimited events written to the local filesystem)
Block stores / object stores via outputs like file-store, s3, gcs, azure-blob, and web-dav-store (objects written per batch; no append)

These can look similar in the editor, but their on-disk (and on-cloud) semantics differ in important ways.

Pick the right output

You need	Prefer	Why
Append events to a local file you can tail	`file`	Writes one line per event and keeps a single growing file.
Durable, partitioned exports to a “bucket”	Block store outputs (`s3`, `gcs`, `azure-blob`, `file-store`, `web-dav-store`)	Writes whole objects; supports batching + preprocessors; avoids “append to object” pitfalls.
“Object store semantics” on disk	`file-store`	Mirrors cloud object stores while writing to a local directory.

Where data is written

All outputs run on the worker executing the job:

If you run on the built-in worker, paths are on the server host/container.
If you run on an external worker, paths are on that worker host/container.

Use absolute paths and ensure the target directory is persistent (volume-mounted in containers) and writable.

Log Files (`file`): append vs overwrite (and flushing)

The Log Files output writes one line per event (NDJSON-style when writing full JSON events).

Default behavior is append: file-per-event: false appends to an existing file (and creates it if missing).
file-per-event: true writes each event by opening the path for a fresh write (so a constant path will be overwritten repeatedly). Use this only with a unique path (for example, by including ${} expansions in the filename).
compress-after: true gzips the previous file when the expanded path changes (and removes the uncompressed file).
truncate: true truncates a file the first time it is seen during a run (useful when re-running a job into the same path).

Flushing: as of LyftData 2.0.2, the file output flushes each event write (the flush-at-end field exists in the schema but is not yet honored). For high-throughput exports, prefer a block store output with batching instead of relying on file buffering.

Block store outputs: objects, GUIDs, and “no append”

Block store outputs write objects, not log files:

There is no append operation for an existing object. Each put writes a complete object body.
By default, the outputs append a GUID to avoid collisions. The default guid-prefix is /, so object-name.name: processed/summary.json becomes keys like processed/summary.json/<uuid>.
If you want filename-style keys, keep the GUID enabled but set guid-prefix / guid-suffix. For example:
- object-name: { name: processed/${partition||unknown}/summary }
- guid-prefix: -
- guid-suffix: .json.gz
…yields keys like processed/2026-03-19/summary-<uuid>.json.gz.
Set disable-*-name-guid: true only when you intend deterministic overwrites or your name is already unique.
Preprocessors like gzip compress bytes but do not rename objects; include .gz in the key yourself.

For large batched uploads, prefer streaming multipart writes when supported:

runtime-options:
  prefer-streaming-outputs: true

This reduces peak memory by streaming batched uploads instead of buffering the entire combined batch in memory.

Naming and partitioning with `${}` expansions

Many output fields accept ${} expansions evaluated per event (including file paths and object names). Use || to provide defaults:

${partition||unknown}
${host||unassigned}

It is usually clearer to create a single partition field in your actions (and sanitize it), then reference it from outputs.

When batching is enabled on a block store output, expansions are resolved once per batch (using the last event in the batch). Ensure each batch contains only one logical partition (or use ${stat|batch_number} instead of per-event fields).

Example: build a partition prefix and use it in an object-store key:

actions:
  - slugify:
      input-field: host
      output-field: host_slug
  - add:
      output-fields:
        partition: host/${host_slug}

output:
  s3:
    bucket-name: analytics-prod-archive
    object-name:
      name: exports/${partition||unknown}/events
    guid-prefix: "-"
    guid-suffix: ".ndjson"

If you need a simple per-batch uniqueness knob, outputs can also expand ${stat|batch_number} when batching is enabled (for example: events-${stat|batch_number}.ndjson).