Job Description
Inference Optimization
- Drive TTFT below 400ms for multi-step agent pipelines
- Streaming optimization: first token to user while sub-agents are still running
- KV cache strategy, prompt compression, dynamic context window management
- Multi-provider routing: model selection by latency, cost, and task type across OpenAI, Anthropic, Gemini, and open-weight models
Agent Architecture
- Design and implement Plan-Execute-Synthesize pipelines that run sub-agents in parallel DAGs, not sequential chains
- Build reliable orchestration on top of Temporal: retries, timeouts, partial failure recovery, idempotency
- Structured output enforcement: JSON schema validation, retry loops on malformed LLM output, graceful degradation
- Tool call design: schema design that LLMs actually follow reliably across providers
Evaluation & Harness
- Own the e...
Ready to Apply?
Submit your application today and join our talented team at Zyoin Group.
Submit ApplicationJob Details
- Location bangalore urban, karnataka
- Job Type Full-time
- Category architecture,backend,design,infrastructure,json,kafka,platform,postgresql,product,python,real-time,red,search,sed,test,testing
- Posted Date June 12, 2026
- Application Deadline July 22, 2026