Staff ML Engineer - Efficient ML & Low-Latency Inference

Moonlake
📍 San Mateo, Rizal, Philippines 💼 Full-time 🕒 Posted March 01, 2026

Job Description

Introducing Moonlake, AI for creating world simulations.

Scope of Work

Training efficiency

  • Dataloaders, fusion, activation remat, gradient checkpointing.

  • FSDP/ZeRO/tensor+pipeline parallel; NCCL tuning.

GPU + kernel performance

  • Nsight profiling, Triton/CUDA kernels, fused ops.

  • Flash-attention–style speedups, sequence packing, KV-cache tricks.

Inference optimization

  • Low-latency serving, continuous batching, speculative decoding.

  • Quantization (GPTQ/AWQ), distillation, pruning.

Infra + reliability

  • SLURM/K8s multi-node jobs, checkpoint hygiene.

  • Determinism, env pinning, GPU failure handling.

We are committed to being an on-site, in-person team currently based in San Mateo

#J-18808-Ljbffr

Ready to Apply?

Submit your application today and join our talented team at Moonlake.

Submit Application

Job Details

  • Location San Mateo, Rizal
  • Job Type Full-time
  • Category IT & Technology
  • Posted Date March 01, 2026
  • Application Deadline April 10, 2026