Security in Managed AI Agent Fleets

Joined
May 12, 2022
Messages
235
Likes
419
Degree
1
Something that has been on my mind lately with the rapid advancement of AI agents is security. I've seen firsthand the power of an agent like Claude. But I've also seen the destruction that can happen when it gets flustered. I've also seen it actively trying to work around guardrails that I placed purposely.

I believe this is only going to become more of a challenge as the models get smarter, and we begin to introduce them to more and more non-technical people.

Very soon, every non technical person in the world will be able to write custom applications from their cell phone without ever seeing any code. However, if you zoom out, you see thousands of VPS on a network controlled by a managed provider.

To give this powerful technology to the masses, which we will, we need to place some guardrails on the systems so that people don't get themselves into trouble.

However, thinking deeper about this, a paradox becomes clear. You cannot place your guardrails internally, because the agent can modify it's own source code.

Any blockers must be invisible to the agent, so they can't reason their way around it.

I do not believe it is safe to use AI agents directly for security roles. Meaning, you cannot place an LLM on a proxy router and expect it to reliably secure the server. Because the agent itself can become compromised.

In a managed network, if not properly secured, a single compromised agent can poison the whole network. This means we need human-only control panels accessed from local machines, undetectable to the network.

This has been an interesting topic to wrestle with and it seems to be an emerging industry. I thought some of you guys might want to think about this too.
 
Back