Datadog Partnership — Omlet Agent

01 / Dashboard Creation

Dashboards Built in Seconds

Describe what you want to monitor and the agent discovers your metrics, builds widgets, and assembles production-ready Datadog dashboards.

Auto-discovers your metrics and tags
Creates widgets with proper queries (avg, p95, p99)
Builds multi-section dashboards with template variables
Exports shareable dashboard JSON

omlet-agent

you > set up a dashboard for our API latency

Discovering metrics via Datadog API...

Found 14 matching metrics for trace.http.request.duration

Tags: service, resource_name, env, region

Building dashboard widgets...

Creating dashboard via API...

Dashboard created: API Latency Overview

4 widgets · 3 template variables · shared with team

02 / Alert Configuration

Intelligent Alert Setup

The agent analyzes your services, identifies monitoring gaps, and configures Datadog monitors with the right thresholds, notification channels, and escalation paths.

P95 Latency Monitor Critical

avg(last_5m):trace.http.request.duration.p95{service:api} > 500ms

Notify: PagerDuty → #ops-critical

Error Rate Spike Warning

sum(last_10m):trace.http.request.errors{service:api}.as_rate() > 0.05

Notify: Slack → #eng-alerts

Disk Usage Threshold Info

avg(last_15m):system.disk.in_use{*} by {host} > 0.85

Notify: Email → infra-team@company.com

Memory Leak Detection Warning

linear(last_1h):system.mem.used{*} by {host}.trend() > 0

Notify: Slack → #infra-alerts

Alert Coverage Analysis

12

Monitors Created

3

Notification Channels

2

Composite Monitors

4

SLO Monitors

Added missing coverage for memory leak detection and disk saturation across 8 hosts

📚 Omlet Agent — Datadog Educator

What's the difference between gauge, rate, and count metric types in Datadog?

Gauge — A snapshot value at a point in time. Your system.mem.used metrics are gauges. They show the current state.

Rate — A per-second derivative. Your trace.http.request.hits is submitted as a rate. Useful for throughput.

Count — A raw total over an interval. Your aws.elb.request_count is a count. Aggregates with sum, not avg.

When should I use .as_rate() vs .as_count() in queries?

Use .as_rate() when you want per-second normalization—great for comparing intervals of different lengths.

Use .as_count() for raw totals, like "how many errors in the last 5 minutes." Your error rate alert already uses this correctly.

03 / Datadog Education

Learn Datadog as You Go

The agent understands your environment and teaches Datadog concepts using your own metrics, services, and configurations as examples.

Explains metric types, tags, and query syntax
Teaches dashboard and monitor best practices
Walks through Datadog APIs and integrations
Context-aware answers using your actual environment

04 / Recommendations

Actionable Recommendations

The agent audits your Datadog setup and delivers targeted recommendations to improve performance, reduce costs, and close monitoring gaps.

Coverage

Add custom tags to APM traces

Enable service-level filtering by adding environment and team tags to your trace spans.

Performance

Switch from avg to p99 for latency SLOs

Your current SLOs use avg() which masks tail latency. p99 better reflects user experience.

Cost

Enable log-based metrics

Replace 3 high-volume log queries with log-based metrics to reduce indexing costs by ~40%.

Coverage

Add anomaly detection monitors

5 services have no anomaly detection. Static thresholds miss gradual performance degradation.

Optimization Summary

~$2.4k

Monthly Savings

47

Unused Metrics

8

Missing Monitors

34%

Tag Cardinality Reduction

Implementing all recommendations would save ~$2.4k/mo and improve alert coverage by 67%

05 / Environment Optimization

Optimize Your Datadog Spend

The agent continuously audits your Datadog environment to eliminate waste, right-size configurations, and keep costs predictable.

Identifies unused custom metrics
Optimizes log pipelines and exclusion filters
Right-sizes APM sampling rates
Audits tag cardinality to prevent cost spikes

app.datadoghq.com/account/usage

1,247

Custom Metrics

↓ 12% after audit

8.2 GB

Daily Log Volume

↓ 34% with filters

94%

APM Trace Coverage

↓ optimized sampling

Monthly Cost by Category

Infra

Logs

APM

RUM

Metrics

Synth

Your Datadog.
Supercharged.

Dashboards Built in Seconds

Intelligent Alert Setup

Learn Datadog as You Go

Actionable Recommendations

Optimize Your Datadog Spend

Transform Your Datadog
Experience

Your Datadog.Supercharged.

Dashboards Built in Seconds

Intelligent Alert Setup

Learn Datadog as You Go

Actionable Recommendations

Optimize Your Datadog Spend

Transform Your DatadogExperience

Your Datadog.
Supercharged.

Transform Your Datadog
Experience