Operate and Scale
LyftData operations focus on keeping the control plane healthy, the job fleet productive, and telemetry flowing to the right places. Use this page as the jumping-off point for your runbooks.
Daily checklist
- Confirm the server is reachable (for example
GET /api/liveness) and that you can sign in. - Watch the live job status feed for stalled deploys, long retries, or sudden error spikes.
- Track worker health in the UI; investigate offline workers and growing backlogs quickly.
- Review errors and warnings in Logs & Issues and in your host logging system (systemd journal, Windows Event Log, or your central logging sink).
Runbooks by theme
- Daily operations - Daily operations playbook keeps the control plane healthy with checklists and drills.
- Observability & alerts - Monitoring LyftData covers metrics, dashboards, and alert wiring.
- Logs and live events - Logs & Issues and Messages are your first stop for triage.
- Resilience & recovery - Backup & recovery explains snapshot cadence, restores, and disaster recovery tests.
- Worker provisioning - Worker auto enrollment covers shared-secret bootstrap flows and what to disable afterwards.
- Capacity planning - Scaling LyftData walks through worker sizing, channel fan-out strategies, and deployment hygiene.
- Security posture - Security hardening documents TLS, secret rotation, and RBAC guidance.
- Telemetry - Telemetry explains what LyftData collects locally and how to access it.
Releases and change management
- Before upgrades, note your current version (
lyftdata --version) and review the release notes. - Use the downloads portal for current builds and checksums.
- Keep a simple change log for your environment (what changed, who approved it, and how to roll back).
Where to go next
- Follow the Daily operations playbook for your everyday checklist and weekly reviews.
- Set up dashboards using the Monitoring guide and plan capacity with the Scaling runbook.
- Harden your deployment via Security guidance and Backup & recovery.
- Track upcoming changes in the release notes and communicate upgrades with stakeholders.