Datadog On-Call Launches to Deliver Observability-Enriched Paging and Unified Incident Management Capabilities

Datadog

Datadog, Inc., the monitoring and security platform for cloud applications, announced Datadog On-Call, a modern on-call experience with observability-enriched paging and seamless incident management workflows. Datadog On-Call instantly coordinates teams with relevant context for faster issue resolution, better incident control and improved collaboration.

DevOps, SRE, Security and IT Operations teams need to maintain high levels of service, but they face challenges such as overwhelming alerts, confusion over dynamically shifting service ownership, disjointed paging strategies, coverage gaps and scheduling issues that make it difficult to understand, prioritize and resolve issues quickly. Traditional on-call systems offer workflows only for paging, while point solutions do not offer observability context, workflows or data, resulting in information gaps that lengthen resolution times.

By unifying observability and paging into one seamless platform, Datadog On-Call solves these issues and eliminates the inefficiencies of multiple disjointed tools, allowing engineers to focus on resolving incidents quickly and effectively without the added stress of switching contexts or missing critical information.

Also Read: Cadence and Intel Foundry Collaborate on Systems Foundry Enablement for the AI Era

“Over one million people at the best-known and most innovative companies trust our product development software to capture customer requests, create visual roadmaps, and build with confidence. Ensuring responsive, available, and secure services is key to our success. Using Datadog On-Call in concert with the APM and Infrastructure monitoring tools allows our operations team to respond quickly to any issues that arise. Our engineers benefit from having the full context of our system at all times, which enables us to deliver a lovable experience to our customers,” said Chris Waters, PhD, CTO at Aha! Labs Inc.

“Being on-call is one of the most challenging aspects of an engineer’s job, where redundant service configurations between various tools can lead to brittle, error-prone setups. The general overhead of maintaining on-call schedules and the ambiguity around service and team ownership make it a grueling ordeal, especially during critical times,” said Michael Whetten, VP of Product at Datadog. “Datadog On-Call addresses these pain points with a team-centric design that clarifies ownership, reduces redundancy and minimizes errors. This approach ensures that every team member knows their role and responsibilities, leading to quicker and more effective incident response.”

Datadog On-Call helps DevOps, SRE, Security and IT Operations teams:

  • Act Quickly and Stay Informed: Paging with integrated observability and seamless incident management ensures critical insights and data are readily available within a single platform, eliminating the need for context switching.
  • Connect with the Tools They Use Every Day: On-Call integrates with a rich ecosystem of third-party monitoring, alerting and service management tools so teams don’t have to learn new workflows or spend resources on training.
  • Ensure Clear Service and Team Ownership: Break down knowledge silos and avoid confusion by associating teams with their respective services to simplify configuration, address ownership gaps and ensure the right responders are paged during an alert. Instantly trace upstream and downstream services affected by an outage or issue.
  • Implement Intuitive Scheduling and Notifications: Automate scheduling and escalation policies to ensure continuous coverage and timely responses, reducing the burden on individual team members and enhancing overall team coordination.
  • Measure On-Call Performance: Rich and customizable analytics measure on-call performance to help ensure system reliability, improve mean-time-to-resolution and optimize the well-being of on-call teams.

SOURCE: PRNewsWire