Job Description
Accountabilities
- Own the Training Environment data architecture end-to-end: dataset design and schema for all ML training pipelines, including dialog corpora for LLM training, conversational steps for NLU models, annotated evaluation sets, and whole-call recordings for speech-to-speech model development.
- Define and govern data selection and sampling strategy: establish criteria that determine which production conversations have the highest training value, including diversity-optimized sampling, confidence-based filtering, edge-case prioritization, and deduplication strategies.
- Build and maintain the data catalog and dataset discovery infrastructure: enable ML engineers across LLM, NLU, Speech, and Agentic teams to find, understand, and use training data without friction.
- Define annotation pipeline architecture: establish requirements for data labeling — intent annotation, entity tagging, dialog act classification, task completion scoring, and age...
Ready to Apply?
Submit your application today and join our talented team at Omilia.
Submit ApplicationJob Details
- Location Greece, Greece
- Job Type Full time
- Category Computer Occupations
- Posted Date June 12, 2026
- Application Deadline July 22, 2026