Job Description
We are mainly looking for a ML Engineer who is experienced and ready to take on this role. The candidate should have a strong background in ML and be capable of handling the tasks and responsibilities that come with the position.
ML Infrastructure
Performance Engineer
Focus:
This role focuses on the "serving plane." The engineer will integrate high-speed inference runtimes with streaming loaders and take ownership of the performance benchmarking mandate.
Key Responsibilities:
Integrate
SGLang
with the
Run:ai Model Streamer
to enable concurrent tensor streaming directly to GPU memory, reducing model "cold start" times.
Optimize SGLang s backend runtime, leveraging features like
RadixAttention
for prefix caching and compressed finite-state machines for faster decoding.
Design and execute rigorous
performance benchmarking
suites to...
ML Infrastructure
Performance Engineer
Focus:
This role focuses on the "serving plane." The engineer will integrate high-speed inference runtimes with streaming loaders and take ownership of the performance benchmarking mandate.
Key Responsibilities:
Integrate
SGLang
with the
Run:ai Model Streamer
to enable concurrent tensor streaming directly to GPU memory, reducing model "cold start" times.
Optimize SGLang s backend runtime, leveraging features like
RadixAttention
for prefix caching and compressed finite-state machines for faster decoding.
Design and execute rigorous
performance benchmarking
suites to...
Ready to Apply?
Submit your application today and join our talented team at Yantran LLC.
Submit ApplicationJob Details
- Location Austin, Texas
- Job Type Full-time
- Category other-general
- Posted Date February 20, 2026
- Application Deadline April 01, 2026