Staff Site Reliability Engineer, Core AI Infrastructure

Remote - USARemote 3w ago

Ready to do the most impactful work of your career? At Coinbase, we are uncompromising on our mission to increase economic freedom. The bar is high, the environment is intense, and we like it that way. This isn't a place for complacency, it’s a place to be pushed past your perceived limits. If you're ready to build the future of finance alongside people who refuse to settle for "good enough," you belong here. Coinbase is a remote-first, but not remote-only company. Expect to get together quarterly for intense in-person working sessions called “surges.” learn more about working at Coinbase.

You'll join a high-performing team of engineers driving AI transformation at Coinbase as a Staff Site Reliability Engineer on the IT Operations team. This team builds and scales the infrastructure powering Coinbase's AI products, with direct exposure to senior leadership in a fast-paced, incubator-style environment. You'll own the reliability and automation of critical AI infrastructure, ensuring our systems are resilient, observable, and secure at scale.

What you’ll be doing (ie. job duties):

Own the reliability, monitoring, and incident response lifecycle for AI infrastructure services, including on-call support for AWS deployment pipelines, root cause analysis, and blameless retros.
Build automation and tooling to streamline operational IT workflows, eliminate manual tasks, and improve deployment velocity across CI/CD frameworks and Kubernetes environments.
Partner with the Coinbase Infrastructure team to extend CI/CD frameworks supporting IT services and enterprise network platforms, and with Security and Compliance to integrate surveillance tooling into deployment pipelines.
Strengthen observability and documentation standards across IT engineering by defining metrics, implementing monitoring solutions, and maintaining technical documentation that sets a standard of excellence.
Develop full-stack applications that power internal AI products and infrastructure with Go or Python.

What we look for in you (ie. job requirements):

8+ years of experience automating and supporting cloud infrastructure (AWS) and network environments, with hands-on use of infrastructure-as-code tools (Terraform, Ansible, Chef, Puppet, or Salt).
Proven experience deploying, managing, and troubleshooting containerized workloads u

Opens the company's application page

About the company

Coinbase

Cryptocurrency exchange and wallet.

All open roles Visit website

Listed via

Greenhouse

Similar roles

Sr. Customer Support Engineer, Raipur

Danaher

IndiaRemote

Collibra Platform Developer (Mid to Senior)

Arch Capital Group Ltd.

PhilippinesRemote

Scheduling Director (Renewables Construction)

MasTec Industrial

United StatesRemote

Mom and Baby Care Manager - RN - Must reside in Nevada

CareSource

United StatesRemote

Design & Tech

Related reads from TCHNX

View all →

The Emergence of Small Language Models: Why Efficiency Is Overtaking Scale

As the AI industry confronts computational costs and environmental concerns, a new generation of compact models is proving that bigger isn't always better. Small language models are reshaping enterprise AI deployment.

tchnx.com

Technology

The Quiet Revolution in Local-First Software

As major platforms face outages and data breaches, a new generation of developers is building applications that prioritise local data storage and peer-to-peer sync, challenging the cloud-first orthodoxy that's dominated tech for two decades.

tchnx.com

Products

The Return of Physical Controls: Why Haptic Feedback Is Reshaping Digital Interfaces

After years of pursuing flat, buttonless designs, tech companies are rediscovering the value of tactile interaction. A new wave of products proves that touching isn't just feeling it's understanding.

tchnx.com