Pipelines Overview
LyftData pipelines are authored as jobs: one input → zero or more actions → one output. Jobs are usually created in the UI’s visual editor and executed by workers.
Build checklist
- Pick an authoring path: the Visual editor for guided configuration or the raw YAML view for power users.
- Shape the pipeline: review the Inputs catalog, Actions overview, and Outputs catalog to understand the primitives you can compose.
- Test early: use Run & Trace plus the variable expansion guide to validate field substitutions and context.
- Plan promotion: once a job works locally, read Deploying jobs and Advanced scheduling so staging and rollout stay predictable.
- Keep iterating: borrow patterns from the tutorials collection and reference the DSL docs whenever you need exact syntax.
Job definition
A job definition is the saved configuration describing how a job should run. You author it in the visual editor (including the YAML tab for “pipelines as code”), and you can export it as YAML/JSON when you want to store jobs in Git.
A job definition includes:
- Name - A user-defined label for the Job.
- Input - Exactly one data source (file, S3, HTTP, etc.), with optional scheduling or triggers.
- Actions - Zero or more transformations, enrichments, or filters applied to data as it flows.
- Output - Exactly one destination to write data (file, API endpoint, cloud storage, etc.).
Job definitions live on the LyftData Server, where they are versioned and deployed to workers.
Important: A single Job has exactly one input and one output. If you need multiple inputs or outputs in a larger pipeline, use worker channels to compose multiple Jobs together.
LyftData enforces one input and one output per Job. If you need:
- Multiple outputs (e.g., archiving to S3, plus sending to a SIEM), or
- Multiple inputs (fan-in from various sources),
you can chain Jobs using worker channels.
Job types
LyftData supports several types of Jobs:
- Once-off Jobs: Run once, complete their task, then stop (no trigger configured).
- Streaming Jobs: Remain active and continuously process new data as it arrives.
- Scheduled Jobs: Trigger at a regular interval or according to a user-defined schedule.
Scheduling and Triggers
Scheduling in LyftData is configured within the input’s Trigger section if relevant. Some inputs don’t require a schedule or trigger. Those inputs are “always on,” continuously reading data as it appears.
For advanced cron, interval, and conditional patterns, see Advanced scheduling.
Job components
Each Job in LyftData consists of:
-
Input
- The single source that reads events (plaintext, compressed, or otherwise) from files, object storage, APIs, or any other supported data location.
- If scheduling is applicable for this input type, configure it under the Trigger section (e.g., a cron-like schedule or a specific time interval).
-
Actions (Zero or more)
- Actions process the input data-called “event data”-to filter, enrich, transform, or extract fields.
- A variety of actions is available, from simple field renaming to advanced data manipulation or enrichment with external lookups.
-
Output
- The single destination that writes event data (for example, to files, APIs, or cloud storage).
- If you need to send data to multiple destinations, the next section addresses Worker Channels which can help you compose multiple Jobs together.
Worker channels for multi-job pipelines
Because each job can only have one input and one output, you may occasionally need to fan in from multiple data sources or fan out to multiple destinations. LyftData provides worker channels to stitch these scenarios together.
Worker channels have a driver configured in the worker settings, which controls delivery semantics:
standard: queue semantics; if multiple jobs subscribe, events are distributed (not duplicated).clone: broadcast semantics; every subscribed job receives a copy (use this for true fan-out to multiple downstream jobs).round-robin: load distributes events across subscribed jobs in a fixed rotation.rule-based: routes events to one or more target jobs based on a field value.
In practice, you can create:
- A job with your real data input and an output pointing to a worker channel.
- One or more downstream jobs that read from that channel and deliver to final destinations.
This modular design gives you flexibility while keeping each individual Job simple.
For a guided example, see Chaining Jobs with Channels (Advanced).
Creating and editing jobs
-
Open the Visual Editor
Navigate to Jobs in the LyftData UI and click New Job (or edit an existing one). -
Job Quick Setup Wizard
- Choose an Input type, where the data is coming from.
- Choose an Output type, where the data is going.
- Optionally, add timestamps to all events.
- Or, skip the wizard and start with the default canvas.
-
Configure the Job Name
- Click on the job name at the top of the canvas to rename it.
-
Configure the Input
- Select a single input type (e.g., file, S3, HTTP) and fill in any required connection details by selecting “Change Input” and then “Configure”.
- If scheduling is relevant (for example, you want the input to run once daily), set it up under the Trigger field of the input.
- When you are done press the “Close Input” button.
-
Add Actions
- Add, remove, or reorder actions as needed to transform or enrich your data.
-
Specify the Output
- Choose your single output destination.
- Fill in connection details (e.g., credentials, endpoint URLs).
-
(Optional) Test the Job
- Use the Run or Run & Trace buttons to run the Job with a small sample of data to verify that it’s working as expected.
- The number of samples and other parameters can be configured under “Run Output”.
-
Save and Deploy
- Review the configuration, and then save it.
- When you’re ready, stage and deploy your Job to one or more workers.
Once deployed
You can monitor Jobs’ status in the UI, where you’ll see logs, any error messages, and performance metrics.
Job execution
When you run or deploy a job, a worker pulls the staged version from the server and starts a job runtime using your current context values.
-
Multithreaded Scaling
The runtime is multithreaded and will leverage as many CPU cores as available, making LyftData highly scalable even under heavy loads. -
Single Config, Single Runtime
Each job execution uses one staged version at a time. If you modify a job definition and stage a new version, future runs pick up the change, but ongoing executions keep running the version they started with until you redeploy or restart them. -
Monitoring and Logs
You can monitor each Job’s execution in the UI. Logs, error messages, and performance metrics help you diagnose issues or tune performance.
Summary
- One Input, One Output per Job keeps configurations straightforward.
- Actions let you shape and transform data en route.
- Worker Channels allow you to chain multiple Jobs together for more complex pipelines.
- Scheduling is set in the input configuration for inputs that support triggered runs.
For more details on available inputs, actions, and outputs, explore the related reference pages or check the examples in our Getting Started Guide.
Where to go next
- Move from samples to real data with the Day 1 production pipeline guide.
- Automate promotions and staging checks using the CI/CD workflow.
- Deep-dive into connectors via the integrations catalog.
- Learn how to monitor and scale your new jobs in the Operate section.