Job Description
Company: Sumo Logic, Inc.
Location: Remote – India
Employment Type: Full-Time
About Sumo Logic
Sumo Logic empowers modern digital businesses with real-time analytics for observability and security. Our SaaS platform enables customers to ensure application reliability, secure cloud-native applications, and gain deep insights into their infrastructure. We work with customers worldwide, providing planet-scale, cloud-native solutions.
Role Overview
We are seeking a Site Reliability Engineer – II to own and improve availability—the most important product feature—across Sumo’s observability and security products. You will work alongside a global SRE team, focusing on operations optimization, cloud efficiency, security hardening, and accelerating developer velocity.
This role requires hands-on experience with cloud-native systems, automation, and microservices lifecycle management, along with strong collaboration skills to support engineering teams.
Key Responsibilities
- Maintain and execute a product-area reliability roadmap to improve reliability, security, efficiency, and developer velocity.
- Collaborate with development infrastructure, global SRE, and product teams to refine operational practices.
- Define, evolve, and manage SLOs for multiple engineering teams.
- Participate in on-call rotations, analyzing operational workload and improving the on-call experience.
- Optimize microservices and architectural components throughout their lifecycle.
- Write automation to reduce toil, enhance efficiency, improve security, and accelerate feature delivery.
- Facilitate root cause analysis and drive continuous improvement for incidents.
- Scale and evolve systems sustainably through automation and process improvements.
- Work in a fast-paced, iterative environment and contribute to cross-functional initiatives.
Required Qualifications & Skills
- Cloud-native application development experience with modern design patterns.
- Strong debugging and troubleshooting skills across the technology stack.
- Deep knowledge of AWS Networking, Compute, Storage, and managed services.
- Experience with CI/CD tooling, including Kubernetes, Terraform, Ansible, and Jenkins.
- Full lifecycle support of services, from development to production.
- Proficiency in Infrastructure as Code (Terraform or CloudFormation).
- Production-ready coding skills in Java, Scala, or Go.
- Solid Linux experience and comfort with command-line tools.
- Familiarity with modern cloud-native security practices.
- Experience working in Agile frameworks (Scrum, Kanban).
- Bachelor’s or Master’s in Computer Science, Electrical Engineering, or related field.
- 2+ years of industry experience.
Desirable Skills
- Experience with Sumo Logic or other observability platforms.
- Planet-scale product development and SaaS operations on AWS.
- Knowledge of streaming technologies (Kafka, Kafka Streams, KSQL).
- Advanced coding in Java, Go, Scala, or Python.
- Advanced infrastructure automation using Terraform, Jenkins, Kubernetes.
- Experience tuning JVM workloads at scale.
Why Join Sumo Logic?
- Contribute to a planet-scale SaaS observability platform.
- Work remotely from anywhere in India while collaborating globally.
- Solve challenging reliability and security problems in a fast-growing tech company.
- Be part of a culture that values innovation, collaboration, and continuous learning.