The promise patrol
As voice and browser agents get cheap, businesses will stop testing only whether systems work and start testing whether promises survive contact with real workflows. The next useful AI layer may be a quiet patrol that catches the gap between the website, the chatbot, the receptionist, the checkout, and the policy binder.

At 4:55pm, the AI caller asks the dental receptionist two boring questions: "Do you have a new-patient appointment this week?" and "Do you take Bupa?"
The receptionist is tired. The practice manager is in a meeting. The booking system has one cancellation slot on Thursday, but the website still says "new patients seen within seven days", the insurer directory lists the clinic as accepting Bupa, and last month’s Facebook advert promised a free whitening consultation for new joiners. The receptionist says no appointments until next month. She is not sure about Bupa, and the whitening offer ended ages ago.
Nobody bought anything. Nobody complained. No customer was lost in a visible way.
But one tiny call just found a revenue leak, a compliance risk, a stale marketing claim, and a training problem. It also produced the thing most operations teams never get until too late: a recording of the exact moment the business promise broke.
The promise breaks in the seam
Businesses do not usually lose trust through one grand betrayal. They lose it in seams.
The advert says free returns. The checkout adds a handling fee. The chatbot invents an exception. The branch says the promotion ended. The policy page was updated last quarter but the call-centre script was not. The store associate gives a different warranty from the product page. The quote form excludes a postcode still listed inside the service area.
Each surface looks defensible on its own. Marketing wrote the claim in good faith. Product shipped the form using the rules they had. Operations trained staff against an older script. Compliance approved the policy before the promotion changed. The customer experiences one company. Internally, the promise is spread across six owners and three systems that do not like each other.
Most companies monitor the machine. Very few monitor the promise.
They know whether the website is up, the checkout completed, the call queue exceeded target, the chatbot deflected 40% of tickets. Those are useful metrics, but they answer the wrong class of question. They tell you whether the system functioned. They do not tell you whether the customer was told the truth.
That gap is where a new AI category is forming: continuous promise testing.
Not AI user research. Not customer service automation. Not a synthetic focus group pretending to know what your customers feel. A patrol loop that asks, over and over, whether the promises your business has already chosen to make still survive contact with the real workflow.
A promise is only real if the weakest surface can keep it.
From uptime to truth-time
DevOps teams have used synthetic monitoring for years. A script logs in, adds an item to basket, runs a test payment, checks an API, confirms the trading flow is alive. It is not trying to behave like a richly motivated human. It is asking a narrow operational question: is the route still open?
The AI version changes the target.
A voice agent can call the clinic. A browser agent can complete the quote form. A chat agent can ask the refund question in three phrasings. An email agent can test whether support gives the same cancellation answer as the help centre. Later, someone will wrap this into in-store checks with human runners, robots, or whatever absurd hybrid call-centre theatre procurement will approve.
The useful question moves from uptime to truth-time.
Truth-time is not philosophical. It is brutally practical:
- Can a budget shopper understand the delivery cost before checkout?
- Can a new patient get the stated appointment window?
- Does the bot honour the refund policy?
- Can a supplier find the portal mentioned in the onboarding email?
- Does a branch still quote a finance promotion after the expiry date?
- Does the quote form quietly reject a postcode the website still advertises?
That is not a sentiment dashboard. It is operational plumbing.
You can see the early pieces already. DeepCall describes AI mystery-shopping calls for service, compliance, brand standards, and scenario accuracy across sectors such as retail, hospitality, automotive, healthcare, and financial services. SyntheticUsers opens a real browser and attempts onboarding, signup, or checkout flows, reporting where a persona became confused or abandoned. Simmerce lets e-commerce teams run AI shopper personas across product pages before spending real traffic.
The category will get sharper when it stops selling fake customers and starts selling alarms.
Nobody serious should care that a model says "the shopper felt mistrustful" unless that maps to a fixable contradiction. The stronger product behaviour is simpler: here is the policy, here is what the customer was told, here is the recording, here is the owner, here is the work item.
A smoke alarm does not write a thought leadership memo about combustion. It makes the right person move.
The boring object that matters
The core product object is not the agent. It is the promise register.
That sounds dull because useful operational software often does. A promise register would have one row per claim:
- Claim: "New patients seen within seven days"
- Owner: practice manager
- Source of truth: booking policy
- Allowed wording: "usually within seven days, subject to availability"
- Surfaces: website, insurer directory, receptionist script, paid search ad
- Risk level: medium
- Expiry date: none
- Probe frequency: weekly
- Escalation: branch manager if contradicted twice
For an e-commerce business, the row might be "free returns within 30 days". For a gym, "cancel any time". For a storage company, "first month half price". For a hotel, "pet-friendly rooms available". For a car dealership, "approved used vehicles include twelve-month warranty".
The agent then converts each row into probes. It calls, chats, clicks, emails, searches, fills forms, uploads a fake document where permitted, and compares the actual answer with the source of truth.
The output should be an exception packet, not an insight paragraph:
- Transcript, screenshot, or recording
- Failed promise
- Expected answer
- Actual answer
- Severity
- Likely owner
- Suggested handoff
- Replay button
Then it lands where blame already travels: Jira, Zendesk, Slack, Salesforce, a practice-management inbox, a branch scorecard, a compliance queue. Green promises vanish into the background. Red promises become work orders.
That is the product design distinction. Research tools want users to browse findings. Patrol tools want exceptions to route themselves.
The Monday morning patrol
Take a regional home-services company: plumbing, heating, roofing, twenty branches, franchise-ish chaos. It buys demand through Google ads, local SEO, leaflets, comparison sites, and old-fashioned referrals. Phones still matter. Branches have local habits. Policies live in a mix of spreadsheets, booking software, PDFs, and the memory of whoever has been there longest.
A weekly promise patrol becomes part of the operations stack.
Triggers are not exotic:
- Price page edited
- Promotion launched
- Staffing rota changed
- Service-area rule updated
- Refund policy changed
- New call-centre script issued
- Bad review cluster detected
- Branch manager replaced
At 6am on Monday, agents pull the latest promise register. They call three branches as an urgent boiler repair. They chat with the website bot as a price-sensitive renter. They complete the quote form from a borderline postcode. They email support about a cancellation fee. They ask about the finance promotion that expired on Friday.
By 8am, three packets exist.
One branch says the call-out fee is waived. The website says "from £80". The quote form adds £80 only after postcode entry. Packet to branch manager: contradiction, estimated lost margin, call recording, suggested script line.
The chatbot tells renters they can book without landlord approval. The policy says they cannot. Packet to product owner and compliance: risky advice, exact prompt, page link, approved wording.
A branch still quotes the expired finance promotion. Packet to operations: old script in use, promotion expired, severity high.
Friday’s operations meeting changes shape. It no longer starts with anecdotes from the loudest regional manager. It starts with failed promises by owner.
That matters commercially because most companies spend heavily to create demand, then allow the promise to decay after handoff. The customer who hears "no, we don’t cover your area" after seeing their postcode in an advert does not always complain. They disappear. They choose the competitor with a cleaner seam.
The most valuable failed promise may be the one no real customer ever reports.
Blame becomes routable
A customer complaint usually arrives as fog.
"Your website is misleading." "The woman on the phone was rude." "I was told something different." "The offer is fake."
Everyone can dodge fog. Marketing says the branch got it wrong. The branch says the website is wrong. Product says the rule came from operations. Compliance says they approved a different wording. The customer support agent apologises with a voucher and nobody fixes the system.
Promise testing changes the unit of accountability. The contradiction arrives pre-labelled:
- Marketing claim versus branch script
- Website copy versus backend rule
- Chatbot answer versus policy
- Directory listing versus appointment availability
- Promotion expiry versus sales behaviour
Less plausible deniability is a product feature, but it is also a political problem.
Front-desk staff will hate this if it is sold as surveillance. Call-centre teams will hear "AI snitch" because, in badly run companies, that is exactly how it will be used. Managers will use transcripts for training whether the vendor markets it that way or not.
Healthcare, finance, insurance, and unionised call centres will need guardrails: call recording consent, synthetic-call disclosure, nuisance-call limits, quiet hours, accessibility checks, approved caller IDs, scenario caps. In some settings, secret shopping is normal. In others, it becomes a legal and cultural mess very quickly.
The best implementation is not "catch Sarah saying the wrong thing". It is "the Bupa policy has three conflicting sources and Sarah is the unlucky interface". That distinction has to be in the workflow, not the sales deck.
The promise librarian
Once promises become monitored objects, somebody has to own them.
The job will not sound glamorous. Promise librarian is probably too honest a title to survive HR. But the work is real: maintain the register, expire old claims, map surfaces, assign owners, approve wording, set probe frequency, tune escalation rules, and decide which contradictions deserve human attention.
Most companies already have fragments of this role scattered across compliance, product ops, marketing ops, service training, and branch management. Nobody owns the whole promise graph.
That fragmentation is why the same product can be bought by four budgets:
- CX buys consistency
- Compliance buys risk reduction
- Growth buys recovered conversion
- Operations buys branch discipline
Same patrol, different dashboard. That creates a packaging trap. If the product becomes generic voice analytics, it competes with every call recording, QA, and contact-centre platform. If it becomes generic AI research, it drowns in fake persona theatre.
The wedge is the promise object plus the patrol loop. Price it like coverage, not mystery shopping.
A bronze tier might monitor fifty promises weekly across web and chat. Silver adds phone and branch sampling. Gold adds event-triggered patrols after policy changes, promotion launches, or review spikes. Failure severity could drive routing and reporting, but I would be careful about pricing directly per failure. You do not want customers arguing with the smoke alarm because the invoice got bigger.
Patrol, not prophecy
The obvious counterargument is right: synthetic customers cannot tell you what customers truly want.
They are poor substitutes for interviews, ethnography, usability sessions, complaint analysis, sales calls, and watching a confused person try to complete a real task while muttering under their breath. AI personas can generate confident nonsense with the emotional range of a LinkedIn post. "The shopper felt anxious about the delivery proposition" is often a verbose way of saying the model guessed.
So do not ask them to be prophets.
Ask them to be patrol.
Their defensible role is narrower and more boring: repeatedly test known claims, known policies, and known flows. If the business has not decided what the promise is, the agent cannot rescue it. It will expose organisational mush and call it a contradiction.
There are practical failure modes too. Staff may learn the scripts and game the check. Agents may call at peak times and annoy teams. They may miss accents, relationship customers, local exceptions, or the social work a good receptionist does in thirty seconds. IVRs, CAPTCHAs, spam filters, and suspicious humans will block them. Policy ambiguity will generate false alarms.
Good patrol products will need rate limits, scenario rotation, quiet hours, branch sampling, escalation thresholds, and humility labels. "Contradiction found" is fair. "Customer truth discovered" is overreach.
The ambition sits in the repetition. Small dull checks, run often, beat theatrical fake customers.
Dirty edge businesses win first
The early winners will not be immaculate software companies with perfect telemetry and product managers debating funnel instrumentation. They will be ordinary multi-location businesses where phones still carry revenue and policy drift is weekly life.
Dental groups. Vets. Clinics. Home services. Car dealerships. Storage companies. Gyms. Hotels. Colleges. Insurers. Local banks. E-commerce businesses with promotions that multiply faster than the team can retire old copy.
These organisations leak money in ways analytics cannot see. A branch says no when the website says yes. A receptionist is unsure about insurance. A checkout fee appears late. A chatbot invents flexibility the policy does not allow. A promotion survives in the call script after finance killed it.
The conventional AI story asks how human the agent can sound. I think that is the wrong race. Humanlike is useful only to the extent that it reaches the seam where the promise breaks.
The winner will not have the most realistic fake customer. The winner will know which promises are worth turning into alarms.
As agents get cheaper, businesses will stop waiting for customers to be the first detectors of their own operational drift. The next competitive advantage may look less like a clever chatbot and more like a quiet patrol that keeps asking the receptionist, the website, the checkout, and the policy binder the same uncomfortable question: are you all still saying the same thing?