Design feedback loops that enable agents and ML systems to improve from real-world outcomes
Track outcomes (engagement, conversion, quality) and evaluate agent performance
Build pipelines that collect and structure agent traces into training and evaluation datasets
Drive continuous improvement via prompting, policies, model selection, and fine-tuning

Develop ML & Agent Systems

Build and deploy ML models (classification, ranking, forecasting, recommendation)
Design AI agents that combine LLM reasoning, tool usage, and ML decisioning
Implement reusable patterns for multi-step reasoning, tool orchestration, and structured outputs
Integrate models and agents into business-critical workflows

Own Data & Model Pipelines

Design and build scalable data pipelines (batch and near real-time) for training, evaluation, and inference
Transform raw interaction data into features, labels, and evaluation datasets
Enable continuous retraining and evaluation through tightly coupled data + model pipelines
Ensure data quality, consistency, and reliability

Evaluation & Experimentation

Build offline and online evaluation frameworks
Develop evaluation datasets, golden traces, and regression-style test sets
Run A/B experiments and track key metrics (quality, revenue impact, latency, etc.)
Use production signals to drive continuous optimization

Systems & API Development

Build scalable Python services and APIs powering agent workflows
Collaborate with platform teams while owning application-level systems
Ensure reliability, observability, and performance

Qualifications

Core Requirements

6+ years in AI/ML engineering or applied data science
Strong Python experience in production systems
Proven experience building and deploying ML models
Experience building data pipelines (ETL/ELT, batch or streaming)
Experience with APIs and backend systems

Agent & LLM Experience

Experience with LLM-powered systems (prompting, orchestration, evaluation)
Familiarity with agent workflows and tool usage
Experience with evaluation loops, agent traces, or iterative improvement systems preferred

Data & Systems Expertise

Experience building data pipelines supporting ML systems
Familiarity with tools like Spark, Airflow/Dagster, Snowflake/BigQuery
Understanding of data quality, lineage, and reproducibility

Modeling & Experimentation

Strong understanding of supervised learning and evaluation methods
Experience with A/B testing and experimentation
Ability to design systems combining ML, LLMs, and business logic

Preferred Qualifications

Experience with agent improvement systems (scoring, optimization loops)
Exposure to evaluation tools (e.g., LangSmith, Braintrust, or similar)
Experience with large-scale experimentation platforms
Familiarity with enterprise SaaS or CRM

What Success Looks Like

Agents and ML models improve continuously via feedback loops
Reliable data and evaluation pipelines power the agent flywheel
Measurable impact on business metrics (conversion, revenue, efficiency)
Fast, safe iteration enabled by strong evaluation systems

For roles in San Francisco and Los Angeles: Pursuant to the San Francisco Fair Chance Ordinance and the Los Angeles Fair Chance Initiative for Hiring, Salesforce will consider for employment qualified applicants with arrest and conviction records.

See more open positions at Own Company

Powered by Getro.com

Privacy policy Cookie policy

We love our portfolio companies.

You’ll love working for one of them.

Lead AI Engineer, Data Solutions

Description

We are looking for a Lead AI Engineer to build next-generation AI and ML systems at Salesforce. This role focuses on developing intelligent decisioning systems and building an agent flywheel—a system of feedback loops that continuously evaluate, optimize, and improve agent performance over time.

This is an applied AI role with strong data and systems ownership. You will build models and agents and the data pipelines and evaluation loops that enable continuous learning in production.

What You’ll Do

Build the Agent Flywheel

Design feedback loops that enable agents and ML systems to improve from real-world outcomes

Track outcomes (engagement, conversion, quality) and evaluate agent performance

Build pipelines that collect and structure agent traces into training and evaluation datasets

Drive continuous improvement via prompting, policies, model selection, and fine-tuning

Develop ML & Agent Systems

Build and deploy ML models (classification, ranking, forecasting, recommendation)

Design AI agents that combine LLM reasoning, tool usage, and ML decisioning

Implement reusable patterns for multi-step reasoning, tool orchestration, and structured outputs

Integrate models and agents into business-critical workflows

Own Data & Model Pipelines

Design and build scalable data pipelines (batch and near real-time) for training, evaluation, and inference

Transform raw interaction data into features, labels, and evaluation datasets

Enable continuous retraining and evaluation through tightly coupled data + model pipelines

Ensure data quality, consistency, and reliability

Evaluation & Experimentation

Build offline and online evaluation frameworks

Develop evaluation datasets, golden traces, and regression-style test sets

Run A/B experiments and track key metrics (quality, revenue impact, latency, etc.)

Use production signals to drive continuous optimization

Systems & API Development

Build scalable Python services and APIs powering agent workflows

Collaborate with platform teams while owning application-level systems

Ensure reliability, observability, and performance

Qualifications

Core Requirements

6+ years in AI/ML engineering or applied data science

Strong Python experience in production systems

Proven experience building and deploying ML models

Experience building data pipelines (ETL/ELT, batch or streaming)

Experience with APIs and backend systems

Agent & LLM Experience

Experience with LLM-powered systems (prompting, orchestration, evaluation)

Familiarity with agent workflows and tool usage

Experience with evaluation loops, agent traces, or iterative improvement systems preferred

Data & Systems Expertise

Experience building data pipelines supporting ML systems

Familiarity with tools like Spark, Airflow/Dagster, Snowflake/BigQuery

Understanding of data quality, lineage, and reproducibility

Modeling & Experimentation

Strong understanding of supervised learning and evaluation methods

Experience with A/B testing and experimentation

Ability to design systems combining ML, LLMs, and business logic

Preferred Qualifications

Experience with agent improvement systems (scoring, optimization loops)

Exposure to evaluation tools (e.g., LangSmith, Braintrust, or similar)

Experience with large-scale experimentation platforms

Familiarity with enterprise SaaS or CRM

What Success Looks Like

Agents and ML models improve continuously via feedback loops

Reliable data and evaluation pipelines power the agent flywheel

Measurable impact on business metrics (conversion, revenue, efficiency)

Fast, safe iteration enabled by strong evaluation systems