GatherJob
Back to jobs
D
Datadog

Senior Software Engineer - Observability Visibility

Datadog
New York, New York, USAOn-site 2w ago

The Observability Visibility SRE Team is part of the Observability and Resilience Enablement group within the SRE/Security organization. Observability and Resilience Enablement focuses on closing the loop between how Datadog engineers detect and respond to issues and incidents and how those learnings translate into measurable risk reduction and lower customer impact. The Observability Visibility team carries the organization's 100% visibility priority, defining observability and reliability baselines and ensuring services consistently meet them by default through scalable, automated, and sustainable solutions.

As a Senior Software Engineer on this team, you will help define, implement and evolve observability and resilience standards across Datadog's engineering organization. You will build systems, tooling, libraries, and automation that make observability and reliability the default experience for service owners, reducing operational risk while driving adoption and consistency. This role combines software engineering and site reliability engineering to drive measurable improvements in engineering effectiveness and service resilience. You will work closely with SRE, platform and product teams to identify gaps, deliver scalable solutions and ensure long-term coverage and compliance with established standards.

At Datadog, we place value in our office culture - the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a work-life harmony that best fits them.

What You'll Do:

  • Define and evolve observability and resilience baselines, ensuring alignment with measurable risk reduction goals across Datadog services.
  • Measure service compliance against established standards, assess risk and remediation complexity and drive sustainable solutions to close identified gaps.
  • Design and deliver scalable observability and reliability capabilities across the software development lifecycle, leveraging automation and AI-driven solutions where appropriate to enable service owners to meet established standards by default while partnering closely with platform, SRE, product and engineering teams to ensure adoption and sustained coverage.
  • Provide technical leadership and day-to-day coaching to team members, accelerating their growth through design reviews, collaborative problem-solving and operational excellence best practices.

Who You Are:

  • You have 5+ years of experience in software engineering, site reliability engineering, or a related discipline supporting production systems at scale.
  • You have hands-on experience with observability and re
Apply now

Opens the company's application page

About the company

Datadog

Datadog

Monitoring and security platform for cloud applications.

Listed via

G

Greenhouse