Descrição da Vaga
The Role
We are looking for a
Research Engineer, Multimodal
to help deliver frontier-quality multimodal datasets, evaluations, and RL environments that improve state-of-the-art models for leading AI labs and enterprise clients.
This is a hands-on, research-facing technical role. You will work directly with customer researchers and engineers to turn their multimodal model improvement goals into concrete data specs — designing tasks that test genuine cross-modal reasoning, not just text comprehension with images attached.
We're targeting candidates with roughly 4–5 years of experience building or improving multimodal AI systems — especially where strong results depended on data curation across modalities, evaluation design, or building systems that handle the complexity of real-world documents, charts, and visual interfaces.
What You'll Do
1. Design and deliver multimodal datasets and evaluation environments
Work with customer researchers to define data requirements for multimodal capabilities: chart reading, document QA, UI understanding, diagram-based reasoning, OCR-aware tasks, and visual grounding.
Design task suites that test true multimodal reasoning — tasks where the answer requires integrating information across text, images, tables, and document structure, not just surface-level pattern matching.
Build ground-truth signals and verification systems for multimodal tasks, where correctness often depends on spatial relationships, visual context, and cross-reference between modalities.
Define task formats and data schemas that capture the full richness of multimodal inputs while remaining tractable for model training.
2. Build quality and validation systems for multimodal data
Perform deep audits of produced data — spotting subtle errors in visual grounding, incorrect chart readings, OCR artifacts, layout misinterpretations, and ambiguous visual references.
Implement automated validation: consistency checks across modalities, format and schema validation, deduplication, and difficulty/diversity controls.
Where appropriate, develop synthetic data generation pipelines: programmatic chart/table generation, document layout templating, controlled visual perturbations, and augmentation across visual conditions.
3. Prove impact through evaluations and training runs
Design and run evals targeting multimodal capabilities aligned with customer goals.
Produce analysis connecting data to outcomes: pre/post comparisons on targeted multimodal tasks, error breakdowns by modality and task type, and ablations identifying which data attributes drive model lift.
When needed, run fine-tuning or RL experiments (or partner with research) to demonstrate measurable multimodal improvement.
4. Collaborate with cross-functional delivery teams
Provide clear specs, visual examples, and edge cases to engineers, QAs, domain SMEs, and data production groups.
Run fast feedback loops grounded in multimodal quality metrics.
Review and improve outputs from large-scale multimodal data creation efforts, maintaining a high bar for cross-modal correctness and realism.
Who We're Looking For
1. Required Qualifications
4–5 years of experience building or improving AI systems involving multimodal data — images, documents, charts, tables, or visual interfaces combined with text.
Hands-on experience with at least one of: document understanding, chart/diagram reasoning, OCR systems, UI understanding, or vision-language models.
Strong intuition for what makes a good multimodal task: genuine cross-modal reasoning requirements, clear ground truth, and resistance to text-only shortcuts.
Demonstrated ability to be extremely detail-oriented — you catch when a chart label is misread, a table cell is misaligned, or a visual reference is ambiguous.
Python proficiency required; comfort with SQL and structured data workflows strongly preferred.
Ability to communicate clearly with researchers and engineers — turning multimodal model goals into concrete data specs.
2. Highly valued experience
RL or post-training experience: RLHF/RLAIF, reward modeling, verifier training, or environment design applied to multimodal tasks.
Experience with vision-language models (CLIP, LLaVA, GPT-4V, Gemini, or equivalent).
Agentic evaluation involving visual grounding: browser-use agents, UI navigation, or screen-based task completion.
Experience building evaluation frameworks or benchmarks for multimodal systems.
Audio/video understanding experience (optional but valued).
Why Turing
Work directly with the world's leading AI labs on the multimodal datasets and evaluations training the next generation of vision-language models.
Real impact: your data will directly shape how models learn to read charts, understand documents, navigate interfaces, and reason across modalities.
Talent-dense team with high autonomy, rapid iteration, and an exceptional learning curve.
Why Turing
Work directly with the world's leading AI labs on the RL environments powering post-training for frontier models.
Real impact: your environments and reward systems will directly shape how models learn to reason, act, and improve.
Talent-dense team with high autonomy, rapid iteration, and an exceptional learning curve.
Values:
We are client first : We put our clients at the center of everything we do, because their success is the ultimate measure of our value.
We work at Start-Up Speed:
We move fast, stay agile and favor action because momentum is the foundation of perfection
We are Al forward:
We help our clients build the future of Al and implement it in our own roles and workflow to amplify productivity.
Advantages of joining Turing:
Amazing work culture (Super collaborative & supportive work environment; 5 days a week)
Awesome colleagues (Surround yourself with top talent from Meta, Google, LinkedIn etc. as well as people with deep startup experience)
Competitive compensation
Flexible working hours
Don’t meet every single requirement?
Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single qualification. Turing is proud to be an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, disability, protected veteran status, or any other legally protected characteristics. At Turing we are dedicated to building a diverse, inclusive and authentic workplace and celebrate authenticity, so if you’re excited about this role but your past experience doesn’t align perfectly with every qualification in the job description, we encourage you to apply anyways. You may be just the right candidate for this or other roles.
For applicants from the European Union, please review Turing's GDPR notice here.