Agents Under Influence
The internet spent 30 years learning to manipulate humans. That was just practice.
In December 2025, a scam operator wanted to run fraudulent ads through an AI review system. They didn’t write a single line of code. They wrote white text on a white background.
The same trick as early black-hat SEO, invisible to humans, legible to machines. Palo Alto Networks’ Unit 42 documented the first confirmed real-world case: an adversarial prompt embedded in web content, consumed by an AI ad-review agent, causing it to approve ads it was built to flag. No hack. Just content doing what content does, except the reader was a machine.
The instinct is to file this under IT. Cybersecurity. Someone else’s problem.
And that would be a mistake.
Because the deeper story is that we deployed AI agents to finally get the rational consumer we always wanted. A buyer who reads the product specs, ignores the countdown timer, and decides on merit. We assumed we’d built our way out of human irrationality.
We haven’t. We have just transferred it.
The Neutrality Assumption
The promise was seductive because the logic seemed airtight. Human buyers are irrational. Dark patterns work on them: fake scarcity, social proof, urgency triggers. Replace the human with an agent, and you remove the vulnerability. The agent compares prices, reads reviews, evaluates fit. No impulse. No anxiety. No manipulation.
This assumption is load-bearing for a lot of enterprise AI strategy right now. Procurement agents. Vendor evaluation agents. Content evaluation agents. The value proposition rests almost entirely on neutrality.
The issue is the raw material.
Agents were trained on human-generated internet data; literally decades of text saturated with persuasion tactics, marketing copy, and cognitive bias triggers. Those patterns didn’t get filtered out when the model was assigned a task. They became part of how the model processes meaning. The manipulation vocabulary is baked in at training. The right trigger surfaces it.
There’s a second layer. The training process that makes language models useful — reinforcement learning from human feedback, RLHF — makes them structurally inclined toward approval-seeking. Models optimized to receive high approval ratings carry an embedded susceptibility to content that mimics those signals. Social proof. Authority cues. Positive framing. The same signals that humans find persuasive, because those signals were everywhere in the training data.
The web already knows this. Search Engine Land is calling it AAO, Assistive Agent Optimization. Marketers are learning to structure content specifically to influence agent decisions. Not to be found by agents. To be chosen by them. The adversarial version is the same techniques with different intent.
The attack rate confirms it isn’t theoretical. Unit 42 noted a sharp acceleration in real-world prompt injection attacks beginning July 2024, timed precisely with the mainstream rollout of AI-assisted browsers and shopping agents. The manipulation followed the opportunity.
The New Manipulation Logic
Start with what happens before the decision. In October 2025, researchers Ben-Zion et al. ran 2,250 experiments across three major language models. Before each agent completed a grocery shopping task, it was exposed to anxiety-inducing narratives — health scares, financial stress, social pressure. All three models shifted systematically toward less healthy choices — effect sizes ranging from Cohen’s d -1.07 to -2.05, in plain English, a significant effect.
Whoever controls what an agent reads before it makes a purchasing decision can bias that decision. Context shapes decisions, for humans and for agents alike. The mechanism is identical. The difference is that humans sometimes notice the influence. Agents can’t.
Now layer in what happens at the point of decision. A 2025 study, Bias Beware, tested the effect of different persuasion signals embedded in product descriptions. Social proof language — “chosen by thousands,” “industry standard,” “widely trusted” — reliably and significantly boosted AI recommendation rates. The effect was consistent across models, hard to detect, and durable.
The counterintuitive finding: scarcity framing backfired. “Limited time offer.” “Only three left.” The urgency triggers that work reliably on human buyers reduced AI recommendation rates. Agents weren’t moved by manufactured shortage. They were moved by apparent consensus. The reason could be structural: humans respond to loss aversion; agents, trained on approval signals, respond to social validation. Different substrate, different lever.
The manipulation logic for agents is not the same logic that works on humans. Some classic triggers amplify. Others invert. The marketers and vendors who figure out which is which first will have a clear advantage.
Call it the Manipulation Transfer. You didn’t eliminate consumer irrationality by deploying agents. You transferred it to a new substrate, one with different vulnerabilities, no self-awareness, and no instinct to resist.
The Pipeline Effect
The structural reason runs deeper than training data.
Agents cannot reliably distinguish content they are processing from instructions they should follow. OpenAI acknowledged in March 2026 that prompt injection “may never be fully solved”. It’s a structural property of how language models work. Their own testing found that a professional business email, indistinguishable from routine commercial correspondence, exfiltrated employee data 50% of the time. Detect a malicious input, OpenAI noted, and you are essentially detecting a lie told by well-crafted language. Language models are not built to be lie detectors.
The implication for commercial manipulation is about ordinary vendor content. If an agent can be misled by a well-crafted email, it can be nudged by well-crafted product copy. The line between “optimized content” and “manipulative content” is intent — and intent is invisible to the agent.
This is where the Manipulation Transfer compounds.
As of 2025, most agent deployments involve pipelines: an agent that retrieves information hands it to an agent that evaluates it, which hands a recommendation to an agent that decides. Work presented at ICLR 2025 on Agent Security Bench (ASB) shows that modern LLM‑based agents are highly susceptible to prompt‑injection‑style attacks, especially when adversarial content flows through other tools or agents rather than arriving as a simple direct prompt.
In practice, agents can be substantially more vulnerable to influence from other agents and components inside the system boundary than to the same influence applied directly from outside.
Every agent you add to a pipeline is a new influence surface. If one agent in a chain processes manipulated content and passes a biased recommendation downstream, the next agent treats that recommendation as trusted input. The pipeline amplifies rather than corrects.
Cornell researchers demonstrated this at scale: a self-replicating worm saturated 50 agents in 11 steps. The Guardian’s Robert Booth documented lab tests in which agents applied peer pressure to other agents to bypass safety controls.
Nobody needed to manipulate the system from outside. The system manipulated itself.
From Context To Persuasion Environment
The gap between legitimate Assistive Agent Optimization and adversarial manipulation is intent, not technique. The same social proof language that a marketer embeds in product descriptions to improve agent recommendation rates is the same social proof language an adversary embeds to direct an agent’s decision. The capability is being commoditized right now in full daylight.
Neutrality is not the default state of an agent. Agents reflect the persuasion environment they operated in. That isn’t a reason to stop. It’s a reason to ask a different question.
Not “what did our agent decide?” but “what was it reading when it decided?”
The answer, increasingly, is content written by people who have studied exactly how agents decide.
We didn’t build a rational web. We built a more efficiently manipulable one.



