Working Student (m/f/d) LLM Agent Evaluation & Benchmarking

Agile Robots SE
📍 Munich, Bavaria, Germany 💼 part_time 🕒 Posted June 24, 2026

Job Description

About the role

We are looking for a Working Student (m/f/d) LLM Agent Evaluation & Benchmarking. In this role, you will design and build an agent-agnostic benchmarking harness, run comparative evaluations across frontier and local models, and translate findings into prompt, guard, and tool-schema improvements.


Your Responsibilities

  • Harness Development: Design and build an agent-agnostic benchmarking harness that executes versioned task suites against frontier and local models with reproducible, version-controlled runs.
  • Task Suite Design: Define and maintain evaluation task suites that measure task success, grounding accuracy, latency, and cost across the agent portfolio.
  • Model Evaluation: Run period...

Ready to Apply?

Submit your application today and join our talented team at Agile Robots SE.

Submit Application

Job Details

  • Location Munich, Bavaria
  • Job Type part_time
  • Category Computer Occupations
  • Posted Date June 24, 2026
  • Application Deadline August 03, 2026