Omlet Agent — AI SRE Agent OS

01 / Coding CLI

Full Shell Power.
Zero Constraints.

A WebGL-accelerated terminal running a real PTY session in your browser. The agent has full shell access—it reads files, runs commands, executes scripts, pipes output, and navigates your codebase exactly like a senior engineer sitting at the keyboard.

Full interactive PTY via xterm.js with WebGL rendering

Persistent sessions—switch tabs, come back, output is still there

Run any command: git, docker, kubectl, curl, custom scripts

Multi-provider: Claude Code, Cursor CLI, or Codex—toggle in one click

omlet — production-api

omlet> The payment service is returning 504s. Investigate and fix it.

Starting investigation. Let me check the service status and recent logs.

$ kubectl get pods -n payments
$ kubectl logs payment-api-7d4f8b-x2k9z --tail=200
$ curl -s http://payment-api:8080/healthz

Found it. The connection pool to Postgres is exhausted—max connections hit at 14:23 UTC after a deploy introduced an N+1 query in the order lookup path.

✓ Identified root cause: N+1 query in OrderService.getByUser()
✓ Applied fix: batch query with WHERE user_id IN (...)
✓ Connection pool recovering: 2/50 → 47/50 available
✓ 504 rate: 23% → 0.1%

omlet> _

02 / Deep Investigation

Watches the Agent Think

Every file read, grep, command, and sub-agent call is visible in real time. The agent dynamically chains tools—spawning sub-agents for parallel investigation, correlating metrics with logs with traces—and the system automatically extracts key insights, errors, and root causes from the full investigation trail.

▶

User

"Latency on checkout-service spiked 3x in the last hour. Find the root cause."

⚙

Tool Group — 14 calls

Bash ×6 Read ×4 Grep ×3 Sub-Agent ×1

Queried Prometheus metrics, tailed service logs, traced slow requests, spawned sub-agent for DB analysis

★

Insight Detected

"Redis cache hit ratio dropped from 94% to 12% at 14:47 UTC after config deploy removed TTL settings"

⚙

Tool Group — 8 calls

Edit ×3 Bash ×5

Restored TTL config, restarted cache, verified hit ratio recovery

✓

Resolved

"Cache hit ratio restored to 91%. P99 latency: 2.4s → 180ms. All alerts cleared."

Session Intelligence

22

Tool Calls

3

Outputs

1

Sub-Agent

0

Errors

Work Phases

Investigation ×2 Code Change ×1 Verification ×1

Root Cause

Config deploy at 14:47 UTC removed Redis TTL settings, causing cache miss storm. All requests fell through to Postgres, exhausting the connection pool and spiking P99 latency to 2.4s.

⚙

pagerduty-mcp

stdio

npx @pagerduty/mcp-server --api-key $PD_KEY

list_incidents acknowledge resolve create_note

◆

k8s-operator

http

http://k8s-mcp.internal:8080/mcp

get_pods scale_deployment rollback get_events exec_pod

⚛

runbook-agent

sse

Custom serverless function: auto-triage alerts using runbook knowledge base

triage_alert search_runbooks suggest_fix

03 / Tool Creation

Build Its Own Tools

The agent isn't limited to built-in commands. Connect any MCP server—PagerDuty, Kubernetes, Datadog, your internal APIs—or write custom serverless functions that the agent can invoke on demand. Test connectivity, discover available tools, and configure granular permissions, all from the UI.

Add MCP servers via form builder or JSON import—stdio, HTTP, SSE

Discover tools: see every function an MCP server exposes before deploying

Write serverless JS/AI functions with cron triggers and OTEL tracing

Granular permissions: allow kubectl get but block kubectl delete

04 / Task System

Scheduled Autonomous Ops

Define agent tasks in plain markdown. Schedule them on any cron cadence—hourly health checks, daily incident summaries, weekly security audits. The agent runs autonomously, resumes from previous sessions to maintain context, and auto-expires when the job is done.

● tasks.md

## Task: Morning Health Check

**Prompt:** "Check all production services,

report any anomalies, and generate

an HTML status dashboard"

**Folder:** /ops/health-checks

**Schedule:** 0 8 * * *

**Timezone:** America/Los_Angeles

**SessionId:** a1b2c3d4

## Task: Security Scan

**Prompt:** "Audit dependencies for

CVEs and open fix PRs"

**Schedule:** weekly

**MaxRuns:** 12

◆ Morning Health Check

Running

⏱ Started 2m ago ⚙ 8 tool calls ↻ Run #47

◆ Security Scan

Next: Mon 9am

↻ 4 of 12 runs ⏱ Last: 3 days ago

◆ Nightly DB Backup Verify

Completed

✓ 3m 12s ⚙ 14 tool calls ↻ Run #182

◆ Incident Postmortem Generator

Trigger: on-alert

↻ Event-driven ⏱ Last: 12 hours ago

05 / Interactive Views

Reports That Come Alive

The agent doesn't just output text—it generates full interactive HTML dashboards, charts, and reports rendered live in a sandboxed browser view. Share them with expiring public links, edit them in a built-in code editor, or let the agent iterate on the design.

Agent-generated HTML/JS/CSS rendered live in sandboxed iframe

WebGL and Canvas for high-performance data visualization

Built-in CodeMirror editor—tweak the agent's output directly

Public share links with configurable expiration (1h to 30 days)

Spreadsheet viewer for CSV/Excel, image viewer, syntax-highlighted code

omlet.internal/views/infra-health-2026-02-19.html

99.97%

Uptime

142ms

P50 Latency

3

Active Alerts

Request Rate (24h)

06 / View Modes

Three Ways to See the Work

Every session can be viewed in three modes. Full detail for engineers, a pipeline timeline for investigations, and an executive summary for stakeholders. Switch instantly—same session, different lens.

List

Pipeline

Summary

● Focus Mode

Detail

Why is checkout-service slow?

⚙ Reading checkout-service/config.yaml

⚙ Running: kubectl top pods -n checkout

⚙ Searching for "timeout" in logs

Redis cache TTLs were removed in the last deploy. Cache hit ratio dropped from 94% to 12%, causing all requests to hit Postgres directly.

⚙ Editing redis-config.yaml

Fixed. TTLs restored, cache recovering. P99 dropping.

● Pipeline

Timeline

▶

User prompt
Checkout latency investigation

⚙

14 tool calls
Bash ×6, Read ×4, Grep ×3, Task ×1

★

Insight
Cache TTL removal caused miss storm

⚙

8 tool calls
Edit ×3, Bash ×5

✓

Resolved
P99: 2.4s → 180ms. All alerts cleared.

● Summary

Executive

22

Tools

3

Outputs

0

Errors

Top Tools

Bash ×11 Read ×4 Grep ×3 Edit ×3

Phases

Investigation Code Change Verification

Root Cause

Config deploy removed Redis TTL. Cache miss rate spiked, exhausting DB pool.

Your Infrastructure.
Its Own Operator.

Full Shell Power.
Zero Constraints.

Watches the Agent Think

Build Its Own Tools

Scheduled Autonomous Ops

Reports That Come Alive

Three Ways to See the Work

Put Your Infrastructure
on Autopilot

Your Infrastructure. Its Own Operator.

Full Shell Power.Zero Constraints.

Watches the Agent Think

Build Its Own Tools

Scheduled Autonomous Ops

Reports That Come Alive

Three Ways to See the Work

Put Your Infrastructureon Autopilot

Your Infrastructure.
Its Own Operator.

Full Shell Power.
Zero Constraints.

Put Your Infrastructure
on Autopilot