Site Reliability Engineer (Hosted Infra) - Platform

United StatesOn-site 2w ago

Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale — unleashing the potential of businesses and people. The Elastic Search AI Platform, used by more than 50% of the Fortune 500, brings together the precision of search and the intelligence of AI to enable everyone to accelerate the results that matter. By taking advantage of all structured and unstructured data — securing and protecting private information more effectively — Elastic’s complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI.

What is the role

We are Cloud Infrastructure SREs that integrate, scale, and evolve multi-cloud infrastructure across 4 Cloud Service Providers, 70+ globally distributed regions, and tens of thousands of hosts to power Elastic Cloud. We tackle hard problems at scale through automation, Infrastructure as Code (IaC), configuration management, and purpose-built software that eliminates toil and improves reliability.

We're also a team that grows people as well as systems. If that challenge genuinely excites you, we'd love to hear from you.

What you will be doing

Engineering software to automate large-scale systems — building internal tools and services, not just running scripts.
Optimizing the reliability and lifecycle of hosts across multiple cloud providers.
Strengthening our observability posture — crafting alerting and monitoring systems that drive incident prevention over incident response.
Scaling global infrastructure and evolving the infrastructure management processes to meet growing demand.
Contributing to code reviews, sharing your work, planning what we need to do next, and both mentoring and being mentored by teammates.
Being part of a balanced SRE on-call rotation: responding to incidents, improving runbooks, participating in postmortems, and championing reliability improvements.

What you bring

Experience building software with Golang. You are also comfortable reviewing others' code and offering constructive feedback.
Production experience operating large-scale cloud compute (hundreds of hosts or more) via automated workflows.
Deep experience with Linux systems — you are at home in the terminal debugging at the OS level.
Proficiency working with containerized workloads in production.
A customer-first, systems-thinking approach to operational problems — you care about root causes, not just s

Apply now

Opens the company's application page

About the company

Elastic

Search and observability company.

All open roles Visit website

Listed via

Greenhouse

Similar roles

Sr. Customer Support Engineer, Raipur

Danaher

IndiaRemote

Collibra Platform Developer (Mid to Senior)

Arch Capital Group Ltd.

PhilippinesRemote

Scheduling Director (Renewables Construction)

MasTec Industrial

United StatesRemote

Mom and Baby Care Manager - RN - Must reside in Nevada

CareSource

United StatesRemote

Design & Tech

Related reads from TCHNX

View all →

Technology

The Quiet Revolution in Local-First Software

As major platforms face outages and data breaches, a new generation of developers is building applications that prioritise local data storage and peer-to-peer sync, challenging the cloud-first orthodoxy that's dominated tech for two decades.

tchnx.com

Products

The Return of Physical Controls: Why Haptic Feedback Is Reshaping Digital Interfaces

After years of pursuing flat, buttonless designs, tech companies are rediscovering the value of tactile interaction. A new wave of products proves that touching isn't just feeling it's understanding.

tchnx.com

Design

The Quiet Revolution of Parametric Design Tools in Everyday Products

Parametric design is migrating from architecture studios to consumer products. As tools democratize and manufacturers adopt flexible production, we're entering an era of mass customization that challenges fundamental assumptions about design.

tchnx.com