How to harness AI Agents without breaking security

Álvaro Ruiz
8 hours ago
6 min read

We are entering a new era in which AI doesn’t just generate content, it acts. AI agents, capable of perceiving their environment, making decisions, and taking autonomous actions, are beginning to operate across the enterprise. Unlike traditional Large Language Models (LLMs) that work within a confined prompt-response loop, agents can research information, call APIs, write and execute code, update records, orchestrate workflows, and even collaborate with other agents, all with little to no human supervision.

The excitement and hype surrounding AI agents is understandable. When designed and implemented correctly, these agents can radically streamline operations, eliminate tedious manual tasks, accelerate service delivery, and redefine how teams collaborate. McKinsey predicts that agentic AI could unlock between $2.6 trillion and $4.4 trillion annually across more than sixty enterprise use cases.

Yet, this enthusiasm masks a growing and uncomfortable truth. Enterprises leveraging agentic AI face a fundamental tension: the trade-off between utility and security. An agent can only deliver real value when it’s entrusted with meaningful control, but every additional degree of control carries its own risks. With agents capable of accessing sensitive systems and acting autonomously at machine speed, organisations risk creating a new form of insider threat (on steroids), and many are not remotely prepared for the security risks that agentic AI introduces.

The vast majority of leaders with cybersecurity responsibilities (86%) reported at least one AI-related incident from January 2024 to January 2025, and fewer than half (45%) feel their company has the internal resources and expertise to conduct comprehensive AI security assessments. Rushing to deploy digital teammates into production before establishing meaningful security architecture has a predictable result. Gartner now forecasts that more than 40% of agentic AI projects will be cancelled by 2027, citing inadequate risk controls as a key reason.

This blog post covers the risks that pose the greatest challenges for organisations building or adopting AI agents today and how to minimise them, enabling technical leaders and developers to make informed, responsible decisions around this technology.

Harness the power of agentic AI with our analysts' help. Talk to an analyst here.

The dark side of AI agents

Rogue actions and the observability gap

Traditional software behaves predictably. Given the same inputs, it produces the same outputs. Understanding results and debugging is therefore a matter of tracing logic, replicating conditions, and fixing the underlying error. However, agentic AI breaks this paradigm. Agents do not follow deterministic paths, meaning their behaviour isn’t always repeatable even with identical inputs, and complex, emergent behaviours can arise that weren’t explicitly programmed. Worse, most systems that agents interact with today lack any understanding of why an agent took a particular action. Traditional observability wasn’t designed to understand why a request happened, only that it did.

This creates a profound observability gap, where organisations can’t understand or replay an agent’s decision sequence. A minor change in context, memory, or input phrasing can lead to an entirely different chain of tool calls and outputs. As a result, traditional debugging techniques collapse. When something goes wrong, teams are often left guessing whether the issue came from the underlying model, the agent design, an external dependency, a misconfigured tool, corrupted memory, or adversarial input.

This problem is exacerbated by the degree of autonomy an agent has, as the longer an agent operates independently and the more steps it takes without human oversight, the larger the gap between intention and action can become. Without robust audit logs designed for agentic systems, organisations can’t reliably answer fundamental questions such as: What did the agent do? Why did it choose those actions? What data did it access? Which systems did it interact with? Could the behaviour repeat?

Expanded attack surface and agents as a new insider threat

When you give an AI agent the ability to act, particularly across internal systems, you effectively create a new privileged user inside your organisation. Too often, this user is granted broad, overly generous permissions, disregarding the principle of least privilege, a cornerstone of cybersecurity. Teams often grant generous permissions because restrictions seem to “block the agent from being helpful”. However, as highlighted earlier in this post, every added degree of autonomy or access carries its own risks. Your “highly efficient digital teammate” can very quickly become a potent insider threat.

Granting agents broad access and permissions to internal documents, systems, repositories, or databases dramatically expands an organisation's attack surface, especially when these agents interact with external services. If an attacker succeeds in injecting malicious instructions through poisoned data, manipulated content, compromised memory, tampered tools, or adversarial prompts, the agent can unknowingly carry out harmful actions on the attacker’s behalf. It may leak sensitive information, modify records, escalate privileges, execute financial transactions, trigger unwanted workflows, or expose data to external systems. The danger compounds in multi-agent environments, where one agent’s compromised output can cascade into others, amplifying the impact of even small vulnerabilities.

Agentic drift

Agents operate in dynamic environments, learn, adapt, and evolve. Over time, this evolution can lead to agentic drift. An agent that performs well today might degrade tomorrow, producing less accurate or entirely incorrect results. Many factors can influence this, such as updates to underlying models, changes to inputs, changes to business context, system integrations, or agent memory. Because drift often emerges gradually, organisations may not notice until the consequences are significant, especially for agents interacting with external stakeholders (e.g. customer service agents) or operating in multi-agent workflows, where drift can cause cascading failures.

Moreover, because AI agents are inherently goal-driven, drift can emerge in which agents start optimising for the metrics they can observe, rather than the ones humans intended. This leads to specification gaming, where agents find undesirable shortcuts that technically satisfy the objective while undermining policy, ethics, or safety. For example, an agent tasked to “reduce task completion time” may quietly eliminate necessary review steps; an agent configured to “increase customer satisfaction” might disclose information it shouldn’t; or a coding agent tasked to “fix errors” might make changes that violate security or compliance constraints.

How to build agents safely

The risks of agentic AI are significant, but the solution is not to avoid agents altogether. The value is too great, and the competitive pressure is too high. Instead, organisations must treat agentic AI as a new class of enterprise technology, requiring its own security model, governance structures, and operational rigour. As the saying goes, “a chain is only as strong as its weakest link”. Don’t introduce a weaker one. To position your organisation to harness the full potential of agentic AI safely, it’s essential to understand how to mitigate these risks.

Establish a rigid command hierarchy. To ensure accountability, AI agents must operate under a clearly defined chain of command where human supervision is technically enforced. Every agent should have a designated controller(s) whose directives are distinguishable from other inputs. This distinction is crucial because agents process vast amounts of untrusted data (such as emails or web content) that can contain hidden instructions designed to hijack the system (prompt injection). Therefore, the security architecture must prioritise the controller’s voice and system prompts above all other noise. Furthermore, for high-stakes actions, such as deleting important datasets, sharing sensitive data, authorising financial transactions, or modifying security configurations, explicit human confirmation should always be required (“human-in-the-loop”).

Enforce dynamic, context-aware limitations. Security teams must move beyond broad, static permissions and instead enforce strict, purpose-driven limits on what agents can do. Agents’ capabilities must adapt dynamically to the specific context of the current workflow, extending the traditional principle of least privilege. For example, an agent tasked with doing online research should be technically blocked from deleting files or sharing data, regardless of its base privileges. To achieve this, organisations require robust authentication and authorisation systems designed specifically for AI agents, with secure, traceable credentials that allow administrators to review an agent’s scope and revoke permissions at any time.

Ensure observability of reasoning and action. Transparency is the only way to safely integrate autonomous agents into enterprise workflows. To ensure agents act safely, their operations must be fully visible and auditable. This requires implementing a logging architecture that captures more than just the final result. It must record the agent’s chain of thought, including the inputs received, reasoning steps, tools used, parameters passed, and outputs, enabling organisations to understand why an agent made a specific decision. Crucially, this data cannot remain buried in server logs; it should be displayed in an intuitive interface that allows controllers to inspect the agent's behaviour in real time.

Organisations that fail to invest early in these foundations may find themselves facing a new generation of incidents, faster, more powerful, and more opaque than anything their current security posture was designed to handle.

The next wave of innovation will not be driven by models that generate text, but by systems that take action. Is your organisation ready for what those actions entail? At SlashData, we can help you navigate the challenges of implementing and scaling agentic AI systems by providing data-backed evidence and insights on how developers successfully create agentic AI workflows, avoiding common pitfalls along the way.