Site Reliability Engineer

BentoML
📍 WorkFromHome, Singapore, Singapore 💼 Full-time 🕒 Posted March 02, 2026

Job Description

About BentoML

BentoML is a leading inference platform provider that helps AI teams run large language models and other generative AI workloads at scale. With support from investors such as DCM, enterprises around the world rely on us for consistent scalability and performance in production. Our portfolio includes both open source and commercial products, and our goal is to help each team build its own competitive advantage through AI.

Role

Join BentoML as a Senior Site Reliability Engineer and take charge of the infrastructure that delivers large language model and generative AI services worldwide. You will architect and operate Kubernetes clusters across AWS, Google Cloud, and on premises environments, turning vast GPU fleets into responsive inference pools. Your work will span writing clean Terraform code, refining GitOps pipelines, tuning Prometheus, and leading incident response. You will set service level objectives that matter, guide teammates throu...

Ready to Apply?

Submit your application today and join our talented team at BentoML.

Submit Application

Job Details

  • Location WorkFromHome, Singapore
  • Job Type Full-time
  • Category Quality Engineering
  • Posted Date March 02, 2026
  • Application Deadline April 11, 2026