Job Description
The Role
This role sits in the core Platform/SRE team that owns production. You’ll work directly on incident response, on-call, system reliability, and day-to-day operations for Heidi’s platform.
What you’ll do
Participate in on-call and incident response: Respond to production incidents, contribute to service restoration, and support clear communication during incidents. Over time, take increasing responsibility for leading incidents end-to-end.
Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes through better alerting, automation, system changes, or process improvements.
Own parts of the production environment: Operate and improve Kubernetes clusters, cloud infrastructure, and core platform services, with growing ownership as familiarity increases.
Strengthen observability: Improve dashboards, alerts, logs, and traces...
Ready to Apply?
Submit your application today and join our talented team at Heidi Health Ltd.
Submit ApplicationJob Details
- Location london, england
- Job Type Full-time
- Category Engineering
- Posted Date June 29, 2026
- Application Deadline August 08, 2026