Agent-on-Agent Commerce: 7 Amazing Proven Anthropic Lessons

Agent-on-agent commerce is no longer just a futurist phrase about AI systems buying and selling for people. Anthropicâ€™s Project Vend showed what happens when a Claude agent is asked to run a small shop, manage inventory, set prices, respond to customer requests, and coordinate with tools over time.

The experiment was not a public marketplace where thousands of bots traded with each other. It was more like a controlled test market: a real office shop, real demand signals, human customers, simulated tools, and a long-running AI business manager called Claudius. That makes it useful because agent-on-agent commerce will need the same building blocks: product discovery, negotiation, payments, memory, pricing rules, and governance.

This article uses the Anthropic Project Vend report, the related Andon Labs Vending-Bench research, and Progressive Robot guidance on AI strategy, workflow automation, DevOps services, and business process automation to explain what the test means for business leaders.

Question	Practical answer
What was tested?	Claude Sonnet 3.7 ran a small automated office shop for about a month
Why does it matter?	It previews AI agents managing economic resources, prices, and customers
Did it succeed?	It showed useful skills but did not run the shop profitably
Main lesson	The infrastructure around the model matters as much as the model
Business takeaway	Start with governed pilots before letting agents buy, sell, or negotiate

What Anthropic actually tested with Project Vend

Project Vend asked Claude to act as the owner of a small in-office vending business. The agent had to choose stock, track inventory, interact with customers, adjust prices, and coordinate with human helpers who could perform physical tasks such as restocking.

That setup matters because it moved beyond a normal chatbot session. Claudius was not simply answering questions. It was expected to operate over time, preserve notes, make decisions under uncertainty, and manage resources. Those are the same pressures that will shape agent-on-agent commerce in real business settings.

The test shop included a small refrigerator, baskets, an automated checkout process, Slack-style customer interaction, web search, note-taking, and tools for changing prices. Anthropic and Andon Labs used the experiment to explore whether model autonomy could survive messy real-world commerce.

The answer was mixed. Claudius could identify suppliers, adapt to unusual customer requests, and resist some unsafe prompts. It also made enough commercial mistakes that Anthropic said it would not hire the agent to expand into vending.

For agent-on-agent commerce, that mixed result is the point: autonomy can be useful before it is fully trustworthy.

Why agent-on-agent commerce is more than chatbot shopping

Agent-on-agent commerce is not just a chatbot recommending a product. It is a system where software agents may discover products, compare suppliers, negotiate terms, trigger payments, check delivery status, update records, and resolve exceptions with limited human intervention.

That requires a different design mindset. A shopping assistant can be wrong and still be corrected by a user. A commerce agent with permissions can spend money, accept a bad deal, leak sensitive demand signals, or create contractual confusion if the workflow is not controlled.

Project Vend is valuable because it shows the hidden layers behind autonomous commerce. The agent needed tools for search, communications, price changes, notes, and business tracking. In a future marketplace, those tools would include procurement systems, payment rails, identity checks, catalogues, tax rules, and dispute handling.

The model is only one part of the stack. Agent-on-agent commerce needs identity, permissions, ledgers, policy checks, memory, monitoring, and human escalation. Without that harness, a promising demo becomes an expensive operational risk.

How Claude ran the test marketplace

Claudeâ€™s shopkeeping agent operated like a tiny digital middle manager. It monitored stock, researched products, handled requests, made restocking choices, and communicated through a team channel. This made the experiment a useful proxy for AI-run marketplaces, even though most counterparties were people.

The design also gave Claudius flexibility. It did not have to stock only normal snacks. Employees asked for unusual goods, including specialty metal items, and the agent tried to respond. That flexibility created both creativity and risk.

In a production version of agent-on-agent commerce, agents would need structured supplier catalogues instead of free-form improvisation. They would need clear unit economics before quoting prices, approved vendors before purchase, and rules for when a request is outside the permitted business model.

The experiment also showed why long-running context is hard. An agent that operates for days or weeks cannot rely on one conversation window. It needs memory, state, summaries, and audit trails that stay accurate when new decisions depend on old commitments.

What went wrong with pricing, memory, and discounts

The most useful part of Project Vend is the failure analysis. Claudius missed profitable opportunities, hallucinated payment details, sold some items at a loss, underused demand signals, and was persuaded into offering discounts that damaged the business.

Those are not small issues. Agent-on-agent commerce will depend on reliable pricing and payment instructions. If one agent sends a false payment address, another agent may execute it automatically. If one agent accepts a discount request without policy checks, automated buyers could exploit that weakness repeatedly.

The discount problem is especially important. Claudius was trained to be helpful, and that helpfulness sometimes conflicted with business discipline. Future commerce agents will need a stronger distinction between customer service and commercial authority.

Memory also proved difficult. The agent did not reliably learn from feedback and sometimes repeated patterns it had already agreed to change. In business automation, that means memory cannot be treated as a vague transcript. It needs structured records, approvals, financial controls, and a way to correct bad assumptions.

What went right for autonomous commerce

Project Vend was not a simple failure story. Claudius identified suppliers, adapted to customer demand, used web research, tracked inventory, and resisted some attempts to push it toward unsafe behaviour. Those capabilities are exactly why autonomous commerce remains interesting.

The experiment suggests that agent-on-agent commerce could become practical in narrow, well-controlled environments. Internal procurement, office supplies, inventory replenishment, B2B quote comparison, subscription management, and low-risk reordering are realistic starting points.

The key is to separate suggestion from execution. An agent can research suppliers, prepare a recommended order, flag price changes, and draft a purchase request before a human approves payment. Over time, low-risk categories can move toward more automation when performance data supports it.

Anthropic also noted that better scaffolding could improve results. That includes stronger prompts, easier business tools, improved search, CRM-style tracking, better memory, and training for sound commercial decisions. In other words, the test points back to infrastructure, not only model intelligence.

That is the practical path for agent-on-agent commerce: improve the harness around the agent before expanding its authority.

Risks businesses should plan for before AI agents buy and sell

Before businesses adopt agent-on-agent commerce, they need clear controls. The first risk is financial loss: an agent might overpay, sell below cost, buy too much inventory, accept bad terms, or fail to notice a cheaper alternative.

The second risk is security. Commerce agents will touch payment systems, supplier accounts, contracts, internal demand data, and customer communication. That makes tool permissions, credential storage, network access, and audit logs essential.

The third risk is manipulation. If agents can be persuaded by friendly messages, fake urgency, malicious product pages, or prompt injection hidden in supplier content, attackers will treat them as new targets. Agent-on-agent commerce needs trusted data boundaries and policy enforcement before automated execution.

The fourth risk is accountability. When two agents negotiate, who is responsible for the result? Businesses need approval thresholds, legal review for high-value commitments, dispute resolution paths, and logs that explain why the agent acted.

A sensible roadmap starts with read-only monitoring, then recommendation workflows, then limited transactions under spending caps. The goal is not to stop commerce automation. The goal is to make it auditable, reversible, and aligned with business policy.

Agent-on-agent commerce FAQ

Did Anthropic create a public AI marketplace?

No. Project Vend was a controlled experiment where Claude managed a small office shop. It is better understood as a test market that previews the infrastructure and risks behind future agent-on-agent commerce.

Why did the experiment matter if it lost money?

The losses are the lesson. They showed where autonomous agents need better pricing discipline, memory, tools, safeguards, and oversight before businesses trust them with real budgets.

Could AI agents negotiate with each other in the future?

Yes, but negotiation will need identity, authentication, spending limits, audit logs, contract rules, and human approval for sensitive decisions. Otherwise automated negotiation can amplify errors quickly.

What is the safest first business use case?

Start with low-risk recommendations: supplier research, price comparison, reorder alerts, inventory summaries, and purchase drafts. Let humans approve payments until the process is proven.

What should teams build before deploying commerce agents?

Build a harness around the agent: approved tools, least-privilege permissions, structured memory, retrieval, evaluations, spend limits, logs, and escalation rules. That infrastructure is the difference between a demo and a dependable system.

Agent-on-agent commerce is coming into view because AI agents are starting to manage tasks over time, not just answer prompts. Anthropicâ€™s test marketplace shows that the opportunity is real, but the operating model is still immature.

The next stage of agent-on-agent commerce will depend on trust infrastructure as much as model performance.

For now, the best strategy is cautious progress. Use agents to research, compare, recommend, and document. Add spending authority only when the workflow has metrics, monitoring, and accountable human owners. If your team wants to design a safe autonomous commerce pilot, contact Progressive Robot to build the roadmap.