Description
We are looking for a Lead AI Engineer to build next-generation AI and ML systems at Salesforce. This role focuses on developing intelligent decisioning systems and building an agent flywheel—a system of feedback loops that continuously evaluate, optimize, and improve agent performance over time.
This is an applied AI role with strong data and systems ownership. You will build models and agents and the data pipelines and evaluation loops that enable continuous learning in production.
What You’ll Do
Build the Agent Flywheel
Design feedback loops that enable agents and ML systems to improve from real-world outcomes
Track outcomes (engagement, conversion, quality) and evaluate agent performance
Build pipelines that collect and structure agent traces into training and evaluation datasets
Drive continuous improvement via prompting, policies, model selection, and fine-tuning
Develop ML & Agent Systems
Build and deploy ML models (classification, ranking, forecasting, recommendation)
Design AI agents that combine LLM reasoning, tool usage, and ML decisioning
Implement reusable patterns for multi-step reasoning, tool orchestration, and structured outputs
Integrate models and agents into business-critical workflows
Own Data & Model Pipelines
Design and build scalable data pipelines (batch and near real-time) for training, evaluation, and inference
Transform raw interaction data into features, labels, and evaluation datasets
Enable continuous retraining and evaluation through tightly coupled data + model pipelines
Ensure data quality, consistency, and reliability
Evaluation & Experimentation
Build offline and online evaluation frameworks
Develop evaluation datasets, golden traces, and regression-style test sets
Run A/B experiments and track key metrics (quality, revenue impact, latency, etc.)
Use production signals to drive continuous optimization
Systems & API Development
Build scalable Python services and APIs powering agent workflows
Collaborate with platform teams while owning application-level systems
Ensure reliability, observability, and performance
Qualifications
Core Requirements
6+ years in AI/ML engineering or applied data science
Strong Python experience in production systems
Proven experience building and deploying ML models
Experience building data pipelines (ETL/ELT, batch or streaming)
Experience with APIs and backend systems
Agent & LLM Experience
Experience with LLM-powered systems (prompting, orchestration, evaluation)
Familiarity with agent workflows and tool usage
Experience with evaluation loops, agent traces, or iterative improvement systems preferred
Data & Systems Expertise
Experience building data pipelines supporting ML systems
Familiarity with tools like Spark, Airflow/Dagster, Snowflake/BigQuery
Understanding of data quality, lineage, and reproducibility
Modeling & Experimentation
Strong understanding of supervised learning and evaluation methods
Experience with A/B testing and experimentation
Ability to design systems combining ML, LLMs, and business logic
Preferred Qualifications
Experience with agent improvement systems (scoring, optimization loops)
Exposure to evaluation tools (e.g., LangSmith, Braintrust, or similar)
Experience with large-scale experimentation platforms
Familiarity with enterprise SaaS or CRM
What Success Looks Like
Agents and ML models improve continuously via feedback loops
Reliable data and evaluation pipelines power the agent flywheel
Measurable impact on business metrics (conversion, revenue, efficiency)
Fast, safe iteration enabled by strong evaluation systems
For roles in San Francisco and Los Angeles: Pursuant to the San Francisco Fair Chance Ordinance and the Los Angeles Fair Chance Initiative for Hiring, Salesforce will consider for employment qualified applicants with arrest and conviction records.