About

We love our portfolio companies.

You’ll love working for one of them.

0
Companies
0
Jobs

Software Engineering SMTS

Own Company

Own Company

Software Engineering
New York, NY, USA
Posted on Feb 24, 2026

Description

Job Title: SMTS, Site Reliability Engineering

About the Role:

The Site Reliability Engineering team is part of the Digital Enterprise Technology Platform Engineering organization, responsible for maintaining and developing the IT monitoring and log analytics platform that ensures Enterprise IT services' reliability.

We're looking for a self-starter with the ability to take ownership of tasks, work under pressure, and balance multiple assignments simultaneously while maintaining a positive outlook. You'll contribute ideas and provide feedback on IT monitoring systems' vision while providing expertise for IT projects and enhancements across various IT organizations.

Responsibilities:

  • Manage, assess, plan, and support core observability platform operations

  • Lead process changes and implementations related to the monitoring platform

  • Provide escalation support for configuration and platform issues, participating in on-call schedules to resolve major incidents

  • Collaborate with key stakeholders (Service Managers, Product Managers, Application Architects, Business Support, and Operations) to gather and develop requirements

  • Develop AI, automation, and integrations to deliver custom monitoring requirements

  • Work with third-party vendors and partners to address platform-related enhancements

  • Support and manage the introduction of new monitoring tools and orchestrate migrations as aging software is retired

  • Present reports on monitoring event metrics and correlation metrics to the Enterprise Operations team periodically

  • Work under Agile scrum methodology and provide guidance to junior team members

  • Create standard operating procedures and share them with the team for effective execution

Minimum Qualifications:

  • Bachelor's degree in Computer Science or related technical field, or equivalent experience in technical leadership

  • 5-8 years of experience designing and implementing distributed systems to handle large-scale telemetry and log data

  • Demonstrable ability in Bash/Powershell, Python, and JavaScript (NodeJS), especially program comprehension

  • Understanding of REST-based API design principles and best practices

  • Experience with server administration (Linux and Windows)

  • Knowledge of monitoring tools like Zabbix, Splunk, Grafana, NewRelic, or ThousandEyes

  • Experience with AWS public cloud and VMware vSphere

  • Knowledge of configuration management and orchestration tools like Puppet, Ansible, or Terraform

  • Experience with Docker and containerized applications

  • Strong troubleshooting and debug skills (reading log files, analyzing memory leaks)

  • Strong analytical skills and ability to gather and synthesize data for review

  • Ability to problem-solve in a fast-paced environment and shift gears effectively

  • Subject matter expertise in at least one monitoring and telemetry product

Preferred Qualifications:

  • Experience with AI and machine learning applications in operations

  • Experience with predictive monitoring and auto-healing solutions

  • Master's degree in Computer Science or related field

  • Experience translating technical concepts into visual representations

For roles in San Francisco and Los Angeles: Pursuant to the San Francisco Fair Chance Ordinance and the Los Angeles Fair Chance Initiative for Hiring, Salesforce will consider for employment qualified applicants with arrest and conviction records.