GatherJob
Back to jobs
D
Datadog

Staff Applied Scientist - Agentic Interfaces

Datadog
New York, New York, USAOn-site 1mo ago

Team description

At Datadog, AI agents are becoming first-class consumers of observability, security, and software delivery data — from third-party coding agents like Claude Code, Cursor, and Copilot, to our own Bits SRE, Bits Assistant, and Bits Dev Agent. The Agentic Interfaces team owns the platform that connects these agents to Datadog: the MCP Server, the tools and retrieval surfaces agents call into, and — critically — the evaluation systems that tell us whether an agent's experience on Datadog data is actually getting better over time.

This role is about that last piece. We're hiring a Staff Applied Scientist to define what "good" means for an Agentic interface at Datadog and to build the measurement systems that make it true. "Good" isn't one number — it spans answer quality, tool-selection accuracy, retrieval relevance, latency, token cost, and end-to-end agent success on real customer workflows. You'll design the evals, build the datasets, define the metrics, and partner with the AI engineers on the team to land the platform that lets every product group at Datadog ship integrations that are demonstrably better release over release.

The space is full of open research questions. How do you evaluate an agent end-to-end when the trajectory is non-deterministic? How do you score tool selection when the tool catalog has hundreds of entries and grows weekly? How do you build a measurement system that catches regressions across first-party and third-party agents at once, without each team writing their own harness? If those are the problems you want to spend your time on, come build this with us.

Datadog values people from all walks of life. We understand not everyone will meet all the above qualifications on day one. That's okay. If you’re passionate about technology and want to grow your skills, we encourage you to apply.

What You’ll Do:

  • Own the evaluation strategy for Datadog's AI agent integrations. Define the metrics — offline and online, quality and cost, single-turn and trajectory-level — that the team and the broader organization optimize against.

  • Build the eval datasets, golden traces, and regression harnesses that catch quality changes before they hit customers, and make those assets reusable by every team contributing tools to the platform.

  • Drive measurable improvements to retrieval relevance, tool-selection accuracy, and context efficiency, partnering closely with the AI engineers on the team who build the underlying platform.

Apply now

Opens the company's application page

About the company

Datadog

Datadog

Monitoring and security platform for cloud applications.