Google Cloud Storage (GCS)
GCS
LyftData ships first-class support for reading from and writing to Google Cloud Storage (GCS).
Configure LyftData to read from GCS
Add the gcs input to a job. Key fields (names shown exactly as they appear in the job spec):
bucket-name– target bucket (required).object-names– list of object names or prefixes. Leave empty to operate on every object surfaced by the selected mode; be cautious with very large buckets.mode– chooselist-objects,download-objects, orlist-and-download-objectsdepending on whether you need metadata only, already-identified objects, or listing plus download.ignore-linebreaks– set when an object should be surfaced as a single event instead of newline-delimited events.timestamp-mode– derive timestamps from creation time, last modified time, or processing time to drive downstream filtering.include-regex/exclude-regexandmaximum-age– reduce the candidate list before downloading by pattern or by an age window like2hor36h45m.fingerprintingandmaximum-fingerprint-age– prevent re-processing by persisting object hashes, and control how long fingerprints are kept.credentials– provide a service-account JSON, application default credentials, or ags://URL through the availableGcsCredentialsvariants. Values can be inlined, supplied via{{ }}context placeholders, or injected via${ }runtime expansions (prefer${secret|scope/name}/${dyn|NAME}for secrets).preprocessors– configure gzip/parquet/base64/extension handlers for content transformation during ingest.
Example: list and download JSON objects
input: gcs: bucket-name: analytics-prod object-names: - exports/daily/ mode: list-and-download-objects ignore-linebreaks: true include-regex: - '\\.json(\\.gz)?$' maximum-age: 6h fingerprinting: true timestamp-mode: last-modified credentials: service-account: key: "${secret|gcp/analytics_gcs_reader}" preprocessors: - gzipThis configuration lists objects under exports/daily/, filters to recent JSON or JSON.gz files, downloads each object once, and surfaces the entire payload as a single event.
Configure LyftData to write to GCS
Add the gcs output to a job. Key fields:
bucket-name– destination bucket (required).object-name– literal or field-derived destination viaObjectDestination. Usename: ...for a fixed path orfield: ...to reuse an event field.mode– set toput(default) to upload events ordeleteto remove objects by name.disable-object-name-guid,guid-prefix,guid-suffix– control the automatic GUID appended to uploads to avoid collisions. By default a GUID is appended as/<uuid>; keep the GUID enabled and set prefix/suffix if you want file-like keys.input-field– choose the field that contains the body to upload. When omitted, the full event is serialized after preprocessors run.batch&retry– tune batching for throughput and configure retries/timeouts for robustness.track-schema– enable when writing JSON so__SCHEMA_NUMBERis updated alongside the payload.credentials– reuse the sameGcsCredentialsforms as the input (service account key, application credentials, orgs://URLs).preprocessors– run gzip/base64/extension handlers before the payload is written to GCS.
See File Handling for details on GUID naming, batching, and ${} expansions.
Example: upload processed events with file-like keys
output: gcs: bucket-name: analytics-prod-archive object-name: name: processed/${partition||unknown}/summary guid-prefix: "-" guid-suffix: ".json.gz" input-field: payload preprocessors: - gzip track-schema: true credentials: service-account: path: /etc/lyftdata/service-account.jsonExample: delete source objects after successful processing
output: gcs: bucket-name: analytics-prod object-name: field: object_name mode: delete credentials: application-credentials: key: "${secret|gcp/analytics_gcs_writer}"The delete output expects each incoming event to carry the object name (for example from the GCS input). Deletion runs without generating GUID prefixes.
Recommendations for files and folders
- keep individual files below 100–150 MB (compressed gzip or Parquet) for predictable processing latency
- organise exported data with directory-style prefixes such as
Y/M/,Y/M/D/, orY/M/D/H/