Senior Software Engineer - Observability Visibility
DatadogThe Observability Visibility SRE Team is part of the Observability and Resilience Enablement group within the SRE/Security organization. Observability and Resilience Enablement focuses on closing the loop between how Datadog engineers detect and respond to issues and incidents and how those learnings translate into measurable risk reduction and lower customer impact. The Observability Visibility team carries the organization's 100% visibility priority, defining observability and reliability baselines and ensuring services consistently meet them by default through scalable, automated, and sustainable solutions.
As a Senior Software Engineer on this team, you will help define, implement and evolve observability and resilience standards across Datadog's engineering organization. You will build systems, tooling, libraries, and automation that make observability and reliability the default experience for service owners, reducing operational risk while driving adoption and consistency. This role combines software engineering and site reliability engineering to drive measurable improvements in engineering effectiveness and service resilience. You will work closely with SRE, platform and product teams to identify gaps, deliver scalable solutions and ensure long-term coverage and compliance with established standards.
At Datadog, we place value in our office culture - the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a work-life harmony that best fits them.
What You'll Do:
- Define and evolve observability and resilience baselines, ensuring alignment with measurable risk reduction goals across Datadog services.
- Measure service compliance against established standards, assess risk and remediation complexity and drive sustainable solutions to close identified gaps.
- Design and deliver scalable observability and reliability capabilities across the software development lifecycle, leveraging automation and AI-driven solutions where appropriate to enable service owners to meet established standards by default while partnering closely with platform, SRE, product and engineering teams to ensure adoption and sustained coverage.
- Provide technical leadership and day-to-day coaching to team members, accelerating their growth through design reviews, collaborative problem-solving and operational excellence best practices.
Who You Are:
- You have 5+ years of experience in software engineering, site reliability engineering, or a related discipline supporting production systems at scale.
- You have hands-on experience with observability and re
About the company
Datadog
Monitoring and security platform for cloud applications.
Listed via
Greenhouse
Similar roles
Sr. Customer Support Engineer, Raipur
Danaher
Collibra Platform Developer (Mid to Senior)
Arch Capital Group Ltd.
Scheduling Director (Renewables Construction)
MasTec Industrial
Mom and Baby Care Manager - RN - Must reside in Nevada
CareSource
Design & Tech
Related reads from TCHNX

The Quiet Revolution in Local-First Software
As major platforms face outages and data breaches, a new generation of developers is building applications that prioritise local data storage and peer-to-peer sync, challenging the cloud-first orthodoxy that's dominated tech for two decades.

The Return of Physical Controls: Why Haptic Feedback Is Reshaping Digital Interfaces
After years of pursuing flat, buttonless designs, tech companies are rediscovering the value of tactile interaction. A new wave of products proves that touching isn't just feeling it's understanding.

The Quiet Revolution of Parametric Design Tools in Everyday Products
Parametric design is migrating from architecture studios to consumer products. As tools democratize and manufacturers adopt flexible production, we're entering an era of mass customization that challenges fundamental assumptions about design.