About

We love our portfolio companies.

You’ll love working for one of them.

0
Companies
0
Jobs

Site Reliability Engineer, GovCloud Incident Response (GIR)

Own Company

Own Company

Software Engineering
Denver, CO, USA
Posted on Feb 19, 2026

Applications will be accepted until 08/22/2026.

Description

Join our team and contribute to the operational excellence of the Salesforce GovCloud!

Are you passionate about ensuring the reliability and performance of mission-critical cloud services? Salesforce is seeking a talented Site Reliability Engineer to join our dynamic team in our Denver, CO, location, supporting our GovCloud environment. As a key member of our Site Reliability organization, you'll play a vital role in maintaining 99.99% uptime for customer-facing services, proactively addressing issues, and ensuring the security of our data. We foster a collaborative and innovative culture, where you’ll work alongside skilled engineers to solve complex problems and drive continuous improvement.

Please Note: This position requires a successful background investigation and the ability to obtain and maintain a specific level of U.S. government background clearance. Details will be provided during the interview process.

Shift Requirements: This role involves shift work, including night shifts, as part of a 24/7 support team. We provide a rotating schedule and ensure adequate compensation for shift differentials.

About the Role:

The Site Reliability team at Salesforce is the backbone of our cloud operations, working around the clock to keep our services available and our customers protected. You will be a crucial part of the GovCloud Incident Response (GIR) team, which maintains the current infrastructure through day-to-day alert response, smart hands support, and comprehensive incident management, including retrospectives and long-term remediation.

Your Responsibilities:

  • Ensure 99.99% uptime for customer-facing services by proactively monitoring and maintaining the health of supporting systems, contributing directly to customer satisfaction and trust.

  • Act in key support roles during major incidents (e.g., Sev0, Sev1) and participate in technical incident reviews for problem management.

  • Contribute to Problem Management by populating and participating in Root Cause Analyses (RCAs) and handing them off to the Global Solutions team.

  • Ensure all work carried out by the Site Reliability team aligns with the company’s internal compliance policies and directives.

  • Collaborate with technical staff to solve complex technical issues and customer concerns.

  • Lead and mentor other team members in staying abreast of industry innovations and technologies, and assist in team development growth.

  • Thrive in a fast-paced environment, solving sophisticated issues quickly and successfully balancing multiple priorities.

  • Automate the detection and resolution of recurring issues in the production environment.

  • Help create and improve current processes to reduce operational and engineering toil, including the implementation of AI-driven automation for routine tasks.

Basic Requirements:

  • Citizenship: U.S. citizen (U.S. born or naturalized) who does not hold dual citizenship. You agree to complete a Minimum Background Investigation (MBI) for a Moderate Public Trust position with the U.S. federal government or other clearances as deemed appropriate for the role.

  • Education: Bachelor’s degree in Computer Science, Engineering, Information Technology, or a related technical field.

  • Experience: Systems engineering experience in enterprise-scale internet service engineering or support role.

  • Technical Skills:

    • Expertise in TCP/IP related technologies (networking protocols, network programming, etc.).

    • Expertise in CLI enterprise support of Unix variants (Linux/Solaris/BSD), with significant exposure to Red Hat Enterprise Linux and Solaris.

    • Strong understanding of monitoring security systems and administration.

    • Experience provisioning, operating, and running AWS/C2S based infrastructure and systems.

    • Proficiency in scripting with Python, Go, or other languages.

  • Communication: Strong written and oral communication skills.

  • Incident Management: Past experience in Incident Management and a good understanding of ITIL service operations.

  • Availability: Ability to participate in a 24/7 on-call rotation supporting large data center operations and be available for shift work.

Preferred Qualifications:

  • Prior experience with Chef/Puppet or automated deployment. (This helps streamline our infrastructure management.)

  • Prior experience with Jenkins/Bamboo/Spinnaker pipeline execution. (This aids in our continuous integration and deployment processes.)

  • Experience supporting and maintaining monitoring and alert systems. (Ensures proactive issue detection.)

  • Experience supporting and maintaining Java applications. (Supports our application stack.)

  • Hands-on experience configuring and running AWS (Amazon Web Services) using the CLI/SDKs. (Essential for our cloud infrastructure.)

  • Certifications in Linux+, RedHat, and AWS. (Validates technical expertise.)

  • Experience supporting and leading Kubernetes-based applications and services. (Supports our containerized environment.)

  • Familiarity with Agile Process and DevOps practices. (Enables efficient workflow and collaboration.)

  • Experience participating in blameless retrospectives, learning from incidents, and conducting post-incident investigations, with an interest in how AI can assist in root cause analysis and pattern identification. (Promotes a culture of continuous improvement.)

  • Working knowledge of and interest in resilience engineering, including concepts such as Safety II and proactive problem prevention, leveraging AI for proactive risk identification and system optimization. (Enhances system reliability.)

  • Experience with AI/ML concepts and tools for operational insights, predictive maintenance, or intelligent automation.

  • Familiarity with data analysis and visualization tools to interpret AI-generated insights.

This candidate must be a U.S. citizen (U.S. born or naturalized) who does not hold dual citizenship and agrees to complete a U.S. federal government Minimum Background Investigation (MBI) for a Moderate Public Trust position.

Apply now to join our dynamic team and help us drive incident response efficiency and system resilience.