There’s a reliable arc to most enterprise AI feature stories. An executive sees a demo. The board approves a budget. An engineering team builds something genuinely impressive in a sandbox. It ships to production. The usage metrics are disappointing. Six months later, the feature is deprioritised.
The problem is rarely the AI. It’s the integration design.
The Question Before the Architecture
Before any technical design decision, there is a single question worth spending disproportionate time on: what job is the user hiring this AI to do?
This sounds obvious. In practice, most AI features are built around what the technology can do rather than what the user needs done at a specific moment in their journey. The result is capability in search of context.
When I built Can I Eat? as a personal proof-of-concept, the answer to that question was narrow and specific: a user with dietary restrictions is looking at a menu or a product and needs a fast, confident answer about whether it’s safe — not a general nutritional analysis, not a recipe suggestion. One question. One answer. Low friction.
That constraint shaped every subsequent decision.
Latency Is a Product Requirement
Enterprise AI features often treat latency as an infrastructure concern. It’s a product concern. Users have a patience threshold that varies by context, and that threshold determines whether an AI interaction feels helpful or broken.
For a search autocomplete feature: under 200ms or it disrupts the user’s flow.
For a product recommendation: under 800ms is acceptable because the user expects computation.
For a checkout risk assessment running server-side: 3 seconds is fine if the user never sees it.
The latency budget shapes the model choice, the prompt design, the caching strategy, and the fallback behaviour. These aren’t infrastructure decisions — they’re product decisions that happen to have infrastructure consequences.
The Fallback Is Part of the Feature
A generative AI feature without a well-designed fallback is a reliability risk. In retail, where system availability directly maps to revenue, a degraded AI response must be better — or at least no worse — than no AI response.
The fallback behaviour should be designed before the happy path is fully built. This disciplines the AI integration to be genuinely additive rather than structurally load-bearing. If the system breaks gracefully without the AI component, the AI component is correctly positioned as an enhancement. If it doesn’t, the architecture has a problem.
What “Integration” Actually Means
Most AI integrations I review are prompt-and-response wrappers with minimal structural thought about where AI sits in the system. The model is called, the response is displayed, the user either uses it or doesn’t.
A production-grade AI feature has to be more deliberate about:
- Input conditioning — what you send to the model is as important as the model itself. Structured, contextualised input produces significantly more reliable output than raw user input passed directly.
- Output validation — especially for structured responses. Don’t trust the model to return clean JSON. Validate, sanitise, handle the cases where it doesn’t.
- Feedback loops — implicit (did the user act on the recommendation?) and explicit (thumbs up/down). These signals are the only way to know whether the feature is actually working.
- Cost instrumentation — token costs in development are never representative of production costs at scale. Build cost tracking before you need it.
The Revenue Test
Every AI feature should be evaluated against a simple question: does this generate, protect, or enable revenue in a way that can be measured?
This is not the only metric that matters. But it’s the metric that determines whether a feature earns its infrastructure cost and engineering maintenance overhead in a commercial environment.
AI features that survive in production are the ones where someone can point to a number and say: this feature moved that number. Everything else is eventually deprioritised.
The technology is remarkable. The design thinking around it is where most of the value gets left on the table — or captured.
Build for the job to be done. Design the fallback first. Measure what matters. The impressive demo is the easy part.