GenAI Evaluation Engineer

Diverse Lynx
📍 Bellevue, Genf, Switzerland 💼 Full-time 🕒 Posted February 24, 2026

Job Description

Job Description

  • Strong understanding of LLMs and generative AI concepts, including model behavior and output evaluation
  • Experience with AI evaluation and benchmarking methodologies, including baseline creation and model comparison
  • Hands-on expertise in Eval testing, creating structured test suites to measure accuracy, relevance, safety, and performance
  • Ability to define and apply evaluation metrics (precisionrecall, BLEUROUGE, F1, hallucination rate, latency, cost per output) Prompt engineering and prompt testing experience across zero-shot, few-shot, and system prompt scenarios
  • Python other programming languages, for automation, data analysis, batch evaluation execution, and API integration
  • Experience with evaluation tools/frameworks (OpenAI Evals, HuggingFace evals, Promptfoo, Ragas, DeepEval, LM Eval Harness)
  • Ability to create datasets, test cases, benchmarks, and ground truth references for consistent...

Ready to Apply?

Submit your application today and join our talented team at Diverse Lynx.

Submit Application

Job Details

  • Location Bellevue, Genf
  • Job Type Full-time
  • Category Other-General
  • Posted Date February 24, 2026
  • Application Deadline April 05, 2026