Lead Engineer - DevOps
Aspire
Software Engineering
Gurugram, Haryana, India
Posted on Mar 11, 2026
Lead Engineer - DevOps
About the team:
At Aspire, our core product relies on a robust, scalable, and highly available infrastructure. The DevOps team is the backbone of our engineering success, responsible for building, maintaining, and automating our CI/CD pipelines, cloud infrastructure, and operational tools. We drive a culture of automation, reliability, and security, ensuring our engineers can rapidly and safely deploy code, ultimately delivering a seamless experience to our customers.
Key Responsibilities
- Infrastructure Automation & Management - Lead the design, implementation, and maintenance of our cloud infrastructure (primarily AWS) using Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
- CI/CD Pipeline Ownership - Own and enhance the entire continuous integration and continuous deployment process, ensuring fast, secure, and reliable releases across development, staging, and production environments (using GitHub Actions, Jenkins, or similar).
- System Reliability & Scalability - Drive initiatives to improve system monitoring, alerting, and logging (e.g., Prometheus, Grafana, ELK stack). Implement and manage auto-scaling solutions to ensure high availability and performance under load.
- Security Integration (DevSecOps) - Collaborate with the Security team to integrate security tools (SAST, SCA, vulnerability scanning) directly into the CI/CD pipeline and enforce security best practices across the infrastructure.
- Operational Excellence - Define and track key performance indicators (KPIs) for infrastructure health and deployment velocity. Establish and maintain runbooks, disaster recovery plans, and incident response procedures.
- Mentorship & Leadership - Mentor junior team members, set technical direction, and champion best practices in DevOps, SRE, and cloud native technologies across the engineering organization.
Minimum qualifications:
- Education & Experience: Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience. 5+ years of progressive hands-on experience in DevOps, SRE, or Infrastructure Engineering, with at least 1 year in a technical leadership role.
- Cloud Expertise (AWS) - Deep practical experience designing, deploying, and managing complex systems in AWS. Strong proficiency with core services like EC2, VPC, IAM, S3, RDS, ECS/EKS.
- Infrastructure as Code (IaC) - Expert-level proficiency with Terraform, CloudFormation, or Ansible for managing infrastructure at scale.
- CI/CD Proficiency - Extensive experience building, optimizing, and maintaining CI/CD pipelines (e.g., GitHub Actions, GitLab CI, Jenkins) for microservices and monolithic applications.
- Containerization & Orchestration - Strong hands-on experience with Docker and Kubernetes (EKS/ECS preferred) for deployment and cluster management.
- Scripting & Automation - Advanced proficiency in scripting languages (Python, Bash, or Go) for system automation, tool development, and API integration.
- Monitoring & Observability - Experience implementing and managing comprehensive monitoring and logging solutions (e.g., Prometheus, Grafana, ELK stack, Datadog) to ensure high visibility into system performance.
Preferred qualifications :
- Networking and Security - Advanced knowledge of networking fundamentals (TCP/IP, DNS, Load Balancing) and cloud security best practices, including WAF management (e.g., Cloudflare) and security group/NACL design.
- Database Operations - Experience with database administration, scaling, and high-availability configuration for modern databases (e.g., PostgreSQL, MongoDB, Redis).
- Advanced Kubernetes: Experience with service mesh (e.g., Istio), Helm, or advanced cluster autoscaling configurations.*
- Good to Have - Compliance & GRC - Familiarity with compliance standards (e.g., ISO 27001, SOC2) and experience implementing automated controls for audit readiness.
- SRE Principles - Deep understanding and application of Site Reliability Engineering (SRE) principles, including SLOs, error budgets, and incident management.