Principal Site Reliability Engineer

Remote US

Principal Site Reliability Engineer
Location: Fully Remote within North America

We’re looking for a Principal Site Reliability Engineer (SRE) to join our growing infrastructure team and help ensure the reliability, scalability, and performance of our cloud-native applications. In this role, you will be instrumental in managing and evolving our Kubernetes-based infrastructure, optimizing cloud costs, and improving development workflows through automation and reliability engineering.

What You’ll Do

  • Own the stability and performance of our entire software platform, with a core focus on our EKS-hosted Kubernetes clusters.
  • Drive cost-efficiency across all infrastructure layers—identify opportunities for savings and lead implementation.
  • Design, implement, and maintain CI/CD pipelines for seamless deployment of applications across environments.
  • Manage and enhance our infrastructure-as-code using tools like Terraform and Helm.
  • Build and maintain systems for observability, including alerting, dashboards, runbooks, and auto-healing mechanisms.
  • Collaborate closely with engineers, BI analysts, and stakeholders across all departments to ensure infrastructure is aligned with business and technical goals.
  • Champion best practices for SRE, DevOps, and cloud-native development.

What We’re Looking For

Required Skills & Experience

  • 6+ years of hands-on experience in an SRE, DevOps, or similar infrastructure-focused role (4+ for Senior, 2+ for Mid).
  • Deep expertise with Kubernetes and AWS (EKS experience highly preferred).
  • Experience with Docker and container orchestration.
  • Proficient with Bash scripting and comfortable troubleshooting in Linux environments.
  • Working knowledge of CI/CD practices, ideally using GitHub Actions and ArgoCD.
  • Infrastructure-as-Code experience, particularly with Terraform.
  • Excellent communication and problem-solving skills, with a strong emphasis on cross-functional collaboration.

Nice to Haves

  • Python or Go programming experience.
  • Familiarity with observability tools like PrometheusVictoria Metrics, and Grafana.
  • Knowledge of KarpenterHelm, and autoscaling practices within EKS.
  • Experience with distributed systems and microservices architecture.
  • A strong eye for optimizing both system performance and cloud costs.

Soft Skills

  • A self-starter with the ability to learn independently.
  • A natural collaborator who thrives in a team-oriented environment.
  • Strong analytical thinking with a practical approach to solving complex infrastructure problems.
  • Clear and effective communicator, both written and verbal.

 


 

Why Join Us?
You’ll be joining a fast-paced, collaborative, and supportive environment where infrastructure is recognized as a core part of product delivery. You’ll have the opportunity to make a major impact on platform reliability, developer productivity, and cost-efficiency.

Apply today to help us shape the future of our platform and build systems that scale with our growth!