Principal Site Reliability Engineer

Remote US

Principal Site Reliability Engineer
Location: Fully Remote within North America

We’re looking for a Principal Site Reliability Engineer (SRE) to join our growing infrastructure team and help ensure the reliability, scalability, and performance of our cloud-native applications. In this role, you will be instrumental in managing and evolving our Kubernetes-based infrastructure, optimizing cloud costs, and improving development workflows through automation and reliability engineering.

What You’ll Do

Own the stability and performance of our entire software platform, with a core focus on our EKS-hosted Kubernetes clusters.
Drive cost-efficiency across all infrastructure layers—identify opportunities for savings and lead implementation.
Design, implement, and maintain CI/CD pipelines for seamless deployment of applications across environments.
Manage and enhance our infrastructure-as-code using tools like Terraform and Helm.
Build and maintain systems for observability, including alerting, dashboards, runbooks, and auto-healing mechanisms.
Collaborate closely with engineers, BI analysts, and stakeholders across all departments to ensure infrastructure is aligned with business and technical goals.
Champion best practices for SRE, DevOps, and cloud-native development.

What We’re Looking For

Required Skills & Experience

6+ years of hands-on experience in an SRE, DevOps, or similar infrastructure-focused role (4+ for Senior, 2+ for Mid).
Deep expertise with Kubernetes and AWS (EKS experience highly preferred).
Experience with Docker and container orchestration.
Proficient with Bash scripting and comfortable troubleshooting in Linux environments.
Working knowledge of CI/CD practices, ideally using GitHub Actions and ArgoCD.
Infrastructure-as-Code experience, particularly with Terraform.
Excellent communication and problem-solving skills, with a strong emphasis on cross-functional collaboration.

Nice to Haves

Python or Go programming experience.
Familiarity with observability tools like Prometheus, Victoria Metrics, and Grafana.
Knowledge of Karpenter, Helm, and autoscaling practices within EKS.
Experience with distributed systems and microservices architecture.
A strong eye for optimizing both system performance and cloud costs.

Soft Skills

A self-starter with the ability to learn independently.
A natural collaborator who thrives in a team-oriented environment.
Strong analytical thinking with a practical approach to solving complex infrastructure problems.
Clear and effective communicator, both written and verbal.

Why Join Us?
You’ll be joining a fast-paced, collaborative, and supportive environment where infrastructure is recognized as a core part of product delivery. You’ll have the opportunity to make a major impact on platform reliability, developer productivity, and cost-efficiency.

Apply today to help us shape the future of our platform and build systems that scale with our growth!

Principal Site Reliability Engineer

What You’ll Do

What We’re Looking For

Required Skills & Experience

Nice to Haves

Soft Skills

View All Open Positions

US Headquarters

India Office