Job overview
Jobs are tasks which usually run in the worker process, but can be executed in their own process for extra isolation and fault tolerance.
Jobs can be ‘once off’ meaning they complete their task and stop, streaming, where they are waiting on changes in the system or scheduled, where they execute on a regular interval. An example of a streaming job is one that that is reading system log files; as new logs appear, they are processed, line by line. A scheduled job would poll an output regularly by executing a command or making an HTTP API request.
Jobs process events which are JSON; raw, uninterpreted data can enter the system but must be quoted in JSON.
A job consists of three parts:
- Input reading events from the outside world (the ‘source’)
- Actions processing these events, adding fields, removing fields, extracting fields from unstructured data, converting from other formats. They transform the data.
- Output write events out (the ‘sink’). Can trivially just print them out, or even discard them, but typically involves moving data to storage (local or remote), HTTP POSTs and so forth.
The actions are optional; one can have a useful job which just copies events from one input to another output.
Running a job from the editor creates a transient job - it won’t affect the rest of the system, and won’t be remembered. Any changes it makes will not be persisted.