Second in a series on Ai and databases.
One Story, three signals – I have a backup of a critical database that has failed three times, the recovery point objective has been breached, the transaction log has a LOG_REUSE state I seem to have failed SQL agent jobs too. Monitoring tools could / should pickup these alerts but I still have the decipher everything. At 2am I could do with some help.
During these callouts there are many thoughts on my mind – are the databases ok (online and readable)? what’s going on in the error logs? Any issues with overnight agent jobs? Are the logs growing out of control – where are these errors coming from, these are basic DBA 101s but together they form a picture. An agent in this world forms a hypothesis after it understands the first signal, tests it with another one and connects the dots based on evidence – we don’t want the LLM to start fabricating things, do we?
Comparing to traditional monitoring tools / scripts it is the same input data but completely different output – not just what but why.