About

We love our portfolio companies.

You’ll love working for one of them.

0
Companies
0
Jobs

Director of Engineering - AI Evaluations & Experimentation

Own Company

Own Company

Software Engineering, Data Science
New York, NY, USA
Posted on Feb 3, 2026

Description

Overview of the Role

We are seeking a Director of Engineering to lead our AI Agent Evaluation and Experimentation Platform team. In this role, you'll own the end-to-end evaluation and experimentation lifecycle for both agentic systems and traditional ML models. You'll be part of Salesforce's AI Engineering organization, working at the forefront of the agentic era as we build Agentforce—the future of AI-powered CRM. Your team will be responsible for building the critical infrastructure that ensures we ship high-quality, safe, and performant AI systems with confidence.

Responsibilities

  • Define and execute the technical vision for evaluation and experimentation across AI agents and traditional ML models

  • Own offline evaluation, regression testing, scenario-based simulations, and multi-turn agent testing infrastructure

  • Build automated evaluation systems including LLM-as-Judge, rule-based scoring, and hybrid evaluation approaches

  • Design and operate online evaluation, observability, and continuous performance monitoring for agent behavior

  • Lead development of self-service evaluation and experimentation tooling for agent workflows, tool use, memory, and planning

  • Support experimentation for both real-time agents and batch or online traditional ML models

  • Integrate evaluation and experimentation pipelines into CI/CD workflows and release quality gates

  • Drive adoption of evaluation and experimentation best practices across engineering and AI teams

  • Set technical direction, review designs, and raise the bar on engineering quality

  • Lead and develop a senior engineering team, fostering innovation and excellence

  • Partner with AI research, product, security, and Responsible AI teams on evaluation and experimentation strategy

Through this role, you'll gain deep experience building large-scale AI infrastructure, shape the future of how Salesforce evaluates and ships AI systems, and make a direct impact on the quality and reliability of AI products used by millions of customers worldwide.

Required Qualifications

  • A related technical degree required

  • 10+ years of engineering experience, with 5+ years leading AI/ML teams

  • Proven ability to lead senior engineers and engineering managers

  • Experience building and operating experimentation platforms for AI systems or ML products

  • Strong understanding of LLM-based agentic architectures and traditional ML systems

  • Experience designing experimentation frameworks for online and offline ML workflows

  • Experience building evaluation systems for models and agents, including offline tests, regression suites, online monitoring, and LLM-as-a-Judge-style approaches

  • Strong background in AI agents and LLM systems, including tool use, multi-step workflows, RAG, prompt and policy management, and common agent failure modes

  • Experience evaluating agent behavior across multi-step workflows and tool-using systems

  • Hands-on experience designing evaluation frameworks for AI systems

  • Experience with offline benchmarking, regression testing, and scenario-based evaluation

  • Experience with automated evaluation approaches such as LLM-as-Judge and hybrid scoring systems

  • Experience with online experimentation methods including A/B testing, shadow testing, and canary deployments

  • Experience integrating evaluation and experimentation into CI/CD pipelines and release gating

  • Experience with data pipelines, metrics systems, and observability tooling

  • Strong cross-functional communication and stakeholder alignment skills

Preferred Qualifications

  • A master's or Ph.D. degree in computer science, machine learning, artificial intelligence, or related field

  • Experience with data and ML platforms (e.g., Snowflake-centric workflows, feature stores, training pipelines)

  • Experience working in high-scale production AI/ML environments

Benefits & Perks

Check out our benefits site which explains our various benefits, including wellbeing reimbursement, generous parental leave, adoption assistance, fertility benefits, and more.