About

We love our portfolio companies.

You’ll love working for one of them.

0
Companies
0
Jobs

Software Engineer - Principal Member of Technical Staff (PMTS) - Availability

Own Company

Own Company

Software Engineering, IT
San Francisco, CA, USA
Posted on Feb 6, 2026

Description

The Availability Standards team is part of the overall Salesforce technology organization. We manage the high-level frameworks used to measure platform uptime and performance, bridging the gap between centralized reporting and the individual engineering teams that own specific services. We follow a consultative engineering approach where our experts partner with service owners to build a deep understanding of service health, telemetry, and automated testing. This level of expertise allows our team to advocate for the customer and influence the product roadmap by ensuring that every service team has the visibility they need to maintain world-class availability.

Role Description:

The Engineering Availability Standards position is a critical role designed for a seasoned engineering veteran who has experience managing, leading, or coordinating with high-scale cloud services. Your mission is to transform how we calculate, visualize, and act upon platform health data. You will serve as the technical bridge between our global availability standards and the distributed engineering teams that power our infrastructure.

You will be responsible for shifting our monitoring strategy from simple reporting into active, high-fidelity signals that engineering teams use for real-time alerting and incident response. This role requires the ability to influence technical roadmaps across different product families and automate the integration of reliability testing and observability into standard software development lifecycles.

Job Responsibilities:

  • Utilize software engineering skills and production experience to provide input into long-range platform requirements and operational guidelines, with a focus on making health data actionable for service owners.

  • Analyze and understand how service teams manage their telemetry, and help drive continuous improvement of health signals based on the knowledge of specific service architectures.

  • Partner with internal engineering teams to integrate global availability standards into their existing monitoring pipelines, dashboards, and automated alerting flows.

  • Identify and mitigate friction in the onboarding process by leveraging existing automated test suites to create high-quality, streamlined reliability signals with minimal manual effort.

  • Serve as a technical subject matter expert to ensure that centralized infrastructure services (logging, monitoring, and data platforms) are optimized to support the needs of individual service owners.

  • Quarterback the integration of failure signals into standard engineering workflows, ensuring that detected issues result in automated work items and proactive investigations.

  • Deliver presentations highlighting availability metrics, reliability trends, and success stories to diverse engineering and leadership audiences.

Required Skills:

  • A related technical degree required.

  • 5+ years of proven experience in production environments (this could include previous experience as a software engineer, systems engineer, service owner, or lead developer).

  • Fluency in Java or a similar object-oriented language (Python, C++, etc.) to provide input on platform requirements and automation.

  • Deep understanding of telemetry systems and experience building or managing production monitoring and alerting frameworks.

  • Experience using Linux environments and the ability to navigate complex, distributed system architectures.

  • Familiarity with core web technologies: HTTP, JSON, REST, and XML.

Desired Skills:

  • Previous experience in a Service Owner or Technical Lead role within a high-scale, multi-tenant cloud environment.

  • Strong background in Site Reliability Engineering (SRE) principles and industry-standard availability best practices.

  • Experience with automated testing frameworks (e.g., Selenium, Integration testing, or Chaos Engineering).

  • Log parsing and data analysis experience using platforms such as Splunk or ELK.

  • Experience with SQL and relational databases (PostgreSQL, Oracle, etc.).

  • Ability to influence technical change across a large, matrixed organization without direct authority.

    *LI-Y

For roles in San Francisco and Los Angeles: Pursuant to the San Francisco Fair Chance Ordinance and the Los Angeles Fair Chance Initiative for Hiring, Salesforce will consider for employment qualified applicants with arrest and conviction records.