Staff Site Reliability Engineer (K8s platform)
OktaSecure Every Identity, from AI to Human
Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.
This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.
Okta Workforce Identity Cloud (WIC) provides easy, secure access for your workforce so you can focus on other strategic priorities, such as reducing costs and doing more for your customers.
If you like to be challenged and have a passion for solving large-scale automation, testing, and tuning problems, we would love to hear from you. The ideal candidate is someone who exemplifies the ethics of, “If you have to do something more than once, automate it” and who can rapidly self-educate on new concepts and tools.
Position Overview:
The Staff Site Reliability Engineer (SRE) will play a key role in building and managing Kubernetes platforms that support cloud-native applications and services. This position focuses on architecting and managing reliable, scalable, and secure Kubernetes-based platforms on AWS, ensuring high availability and performance while optimising costs and automation. The ideal candidate will have hands-on experience with AWS infrastructure, Kubernetes platform creation, Helm charts, Karpenter scaling, and Istio service mesh.
Key Responsibilities:
- Kubernetes Platform Creation: Design, implement, and maintain highly available, scalable, and fault-tolerant Kubernetes platforms. Ensure clusters are optimised for production workloads, providing high resilience and operational efficiency.
- AWS Infrastructure Management: Build, manage, and optimise AWS cloud infrastructure, including EKS, ECS, S3, VPCS, RDS, IAM, and more. Implement best practices for cost management, scaling, and security within AWS.
- Helm Management: Utilise Helm to automate and streamline the deployment of applications and services to Kubernetes clusters. Create, maintain, and manage Helm charts for production-ready deployments.
- Karpenter Implementation: Implement and manage Karpenter to dynamically scale Kubernetes clusters in response to workload demands.
- Istio Service Mesh Management:
Listed via
Greenhouse
Similar roles
Sr. Customer Support Engineer, Raipur
Danaher
Collibra Platform Developer (Mid to Senior)
Arch Capital Group Ltd.
Scheduling Director (Renewables Construction)
MasTec Industrial
Mom and Baby Care Manager - RN - Must reside in Nevada
CareSource
Design & Tech
Related reads from TCHNX

The Quiet Revolution in Local-First Software
As major platforms face outages and data breaches, a new generation of developers is building applications that prioritise local data storage and peer-to-peer sync, challenging the cloud-first orthodoxy that's dominated tech for two decades.

The Return of Physical Controls: Why Haptic Feedback Is Reshaping Digital Interfaces
After years of pursuing flat, buttonless designs, tech companies are rediscovering the value of tactile interaction. A new wave of products proves that touching isn't just feeling it's understanding.

The Quiet Revolution of Parametric Design Tools in Everyday Products
Parametric design is migrating from architecture studios to consumer products. As tools democratize and manufacturers adopt flexible production, we're entering an era of mass customization that challenges fundamental assumptions about design.