The ChatGPT pilot is a conversation. The production system is an architecture. The gap between the two is where the majority of AI deployments die — and it is not because the pilot was wrong. It is because the architecture decisions that determine whether the pilot can scale were never made.
Five decisions that define the production architecture, in the order they need to be made:
- Where does the model run? Hosted by the vendor, in your cloud, or in a hybrid arrangement. This decision dictates data egress, audit posture, latency, and renewal pricing for the next five years. Most pilots punt this decision until production, by which point they are already locked in.
- What is the system of record? The AI's output goes somewhere — your ERP, your CRM, a data warehouse, an audit log. That destination needs to exist, be production-grade, and be schema-stable before the model is wired to it. Building the model first and the destination later is the most expensive mistake in this category.
- How is the confidence framework structured? Threshold per workflow, exception path with named owner, audit log schema, quarterly review cadence. Without this, the system has no controls and no auditability. The pilot did not need it; production cannot exist without it.
- Who owns the operational handover? The model owner, the integration engineer, the operations SME, and the translator. If any of those four roles is not staffed by name, the production deployment has a structural failure point — not a technical one.
- What is the recalibration cadence? Quarterly threshold review, semi-annual model evaluation, annual specification rewrite. Production AI is not a static system. The decisions made in month three need to be re-examined in month nine, or the system silently degrades while everyone reports that it is working.
In an engagement involving roughly 465,000 transactions flowing between two enterprise systems, this architecture work — the five decisions, plus the resulting documentation — took six months to complete. The model itself was working in week three. The system did not go live for another twenty weeks. Everything between those points was architecture, not the model.
If your pilot is producing impressive demos but your team cannot answer those five questions, you do not have a path to production. You have a science project that will fail at the same point every other unscoped pilot fails: the moment the operation needs to absorb it.
The fastest way to know whether your pilot has a path forward is not to look at the model's outputs. It is to look at whether anyone on the team can answer all five questions without checking a Slack thread.
Filed under





