AgentDBA vs Critical SQL Server

It’s 07:43. Someone’s already left a message. “Something’s wrong with the DB server.” You open the terminal and go to work.

This post is about what happens next — and why I built AgentDBA to handle it the way it does.

What is AgentDBA?

AgentDBA is a self-hosted, CLI-based autonomous diagnostic reasoning engine for SQL Server. It is not a monitoring tool. It does not sit in the background polling metrics and firing alerts. You invoke it, it investigates, it reasons, and it tells you what it found and why.

The distinction matters. Monitoring tells you something happened. AgentDBA tells you what it means and what to look at next — grounded entirely in evidence it collected itself. No speculation. No plausible-sounding guesswork dressed up as a finding.

How it thinks

AgentDBA isn’t a script with an LLM bolted on. It reasons across multiple steps, remembers what it has seen before on your server, and decides for itself when it has enough evidence to conclude. Every action it takes is auditable — you can reconstruct exactly what it did and why without ever touching the LLM again. The LLM never receives raw SQL data. It receives structured, pre-processed findings. What it does with those findings is reasoning, not retrieval.

The non-negotiable: evidence first

The agent does not fill gaps with inference. Every finding must trace directly to data it collected. If the cause cannot be proven from what it holds, it says so — explicitly. That rule is not a prompt suggestion. It is enforced at code level.

When AgentDBA encounters a database integrity event or a critical error log condition, it doesn’t deliberate. It escalates immediately and ends the session. No reasoning loop. No sweep across other modules. Just the finding and a human on the hook.

When the critical class is clear, AgentDBA investigates the rest of the server. It selects which areas to examine, reasons across what it finds and concludes with severity and confidence — or with explicit uncertainty where evidence doesn’t support a conclusion. Every tool call, every decision, and every raw result is written to audit telemetry. The session is fully reconstructable from the database alone.

A clean server first

Before I introduce chaos, here is what AgentDBA looks like against a healthy server. I connect via Windows Auth, the LLM connects to Azure OpenAI GPT-5.4, and I run a full health check.

The server is clean. The only thing it surfaces is a historical finding: this server previously had a transaction log space warning on msdb, which I resolved. The agent knows this — it logged the episode, marked it resolved, and references it in context. It is not re-raising an issue that no longer exists.

Clean bill of health. Now let’s break things.

Scenario: The Critical Storm

Many general failures but hidden deep within the group of issues are the dreaded Error 824 and Error 825 mixed with a RECOVERY_PENDING database means it cannot be accessed until someone intervenes.

These are not “look into it when you get a chance” events. These are the events that need looking at right away. I’ve staged exactly this on my VM server: a flood of errors (50+) in the SQL Server error log, 825 read retry events buried within them, and a database sitting in RECOVERY_PENDING. This is the kind of noise that hides the thing that matters.

I run the health check.

AgentDBA never reaches the LLM.

AgentDBA finds the CRITICAL conditions and short-circuits immediately. The RECOVERY_PENDING database is flagged. The 824/825 errors are flagged. The session ends. Escalation fires.

This is an explicit design decision. I do not want a reasoning loop when a database has a potential integrity event. I want to know immediately. Tell me. I’ll deal with it everything else can wait.

The escalation router has hooks for Slack and Teams — not wired up in this demo, but the architecture is there. A CRITICAL finding should be in your on-call channel before you’ve finished your coffee.

Fixing the critical issues

I work through the RECOVERY_PENDING database. I address the underlying cause and the critical checks now clear.

Missing Backups?

Once fixed, you call the agent again. With the critical class clear, AgentDBA now reasons across all other modules including failed jobs, backup compliance, log file health — and synthesises the finding.

It comes back with a finding. Not catastrophic. But real. SALESDB has no backup recorded.

This hadn’t surfaced during the first run — not because the agent missed it, but because the critical-class check is intentionally scoped. It isn’t a full severity sweep. It is locked to a specific class of problem: database integrity and high severity errors that require immediate human intervention before anything else runs.

RECOVERY_PENDING and 824/825 errors are in that class – from my POV, these are non-negotiables, I used to have nightmares when confronted with potential corruption. A broken backup chain, even though important, is a different category of problem — serious yes but not corruption serious.

IMPORTANT – When you call a specific module directly, AgentDBA focuses there. When you call a full health check, it prioritises catastrophic conditions first before broader investigation.

Root cause — and when it stays null

It will not connect the missing backup to something like a job failure or make something up. It needs direct evidence otherwise this is fabrication. So, it will say root cause unknown or words to that effect.

That boundary is not a limitation. It is the feature. A diagnostic tool that invents causal chains sends you to fix the wrong thing.

I fix the backup chain. I run again.

Everything is clean.

What this is

AgentDBA is not trying to replace your judgment. It is trying to make sure that when something is wrong, you are looking at the right thing and the right time within sixty seconds — not an hour later after manually correlating error logs, backup history, and job history across three SSMS windows.

The findings are evidence bound. The reasoning is auditable. And the longer it runs on your estate, the more context it carries. That is the point.

Please note AgentDBA is at beta stage – more information can be found at http://www.agentDBA.ai

All About Tech

Cloud Tech

AgentDBA vs Critical SQL Server

Leave a Reply Cancel reply

Share this:

Related

Leave a Reply Cancel reply