All insights
AI & Automation

From ChatGPT pilot to production system: the architecture decisions that matter

The ChatGPT pilot is a conversation. The production system is an architecture. The gap between the two is where the majority of AI deployments die — and it is not because the pilot was wrong. It is because the architecture decisions that determine whether the pilot can scale were never made.

Published2 min read
From ChatGPT pilot to production system: the architecture decisions that matter
AI & Automation2 min read
Share

The ChatGPT pilot is a conversation. The production system is an architecture. The gap between the two is where the majority of AI deployments die — and it is not because the pilot was wrong. It is because the architecture decisions that determine whether the pilot can scale were never made.

Five decisions that define the production architecture, in the order they need to be made:

  1. Where does the model run? Hosted by the vendor, in your cloud, or in a hybrid arrangement. This decision dictates data egress, audit posture, latency, and renewal pricing for the next five years. Most pilots punt this decision until production, by which point they are already locked in.
  2. What is the system of record? The AI's output goes somewhere — your ERP, your CRM, a data warehouse, an audit log. That destination needs to exist, be production-grade, and be schema-stable before the model is wired to it. Building the model first and the destination later is the most expensive mistake in this category.
  3. How is the confidence framework structured? Threshold per workflow, exception path with named owner, audit log schema, quarterly review cadence. Without this, the system has no controls and no auditability. The pilot did not need it; production cannot exist without it.
  4. Who owns the operational handover? The model owner, the integration engineer, the operations SME, and the translator. If any of those four roles is not staffed by name, the production deployment has a structural failure point — not a technical one.
  5. What is the recalibration cadence? Quarterly threshold review, semi-annual model evaluation, annual specification rewrite. Production AI is not a static system. The decisions made in month three need to be re-examined in month nine, or the system silently degrades while everyone reports that it is working.

In an engagement involving roughly 465,000 transactions flowing between two enterprise systems, this architecture work — the five decisions, plus the resulting documentation — took six months to complete. The model itself was working in week three. The system did not go live for another twenty weeks. Everything between those points was architecture, not the model.

If your pilot is producing impressive demos but your team cannot answer those five questions, you do not have a path to production. You have a science project that will fail at the same point every other unscoped pilot fails: the moment the operation needs to absorb it.

The fastest way to know whether your pilot has a path forward is not to look at the model's outputs. It is to look at whether anyone on the team can answer all five questions without checking a Slack thread.

Continue Reading

More From the Insights Blog.

View all insights
The handover document every production AI engagement should leave behind
AI & Automation

The handover document every production AI engagement should leave behind

When a production AI engagement ends, there is exactly one artifact that determines whether the system survives the consultant's exit: the handover document. Most engagements do not produce one. The system runs for nine months and then quietly degrades, because the knowledge of how it was built lives in an inbox the consultant no longer reads.

Read post
Why your IT team cannot ship the AI deployment your CFO is asking for
AI & Automation

Why your IT team cannot ship the AI deployment your CFO is asking for

When a CFO asks IT to "deploy AI for payables automation," the request lands in a department that is structurally not configured to deliver it. This is not an IT failure. It is a category error in how the work was assigned. Four structural mismatches: 1. IT teams measure uptime; AI deployments require judgment. IT is graded on whether systems are available. AI is graded on whether the system's outputs match the operational reality of the business. The first is a network problem; the second is a finance problem. They share almost no skills and no metrics.

Read post
The translation layer: the most undervalued role in enterprise AI
AI & Automation

The translation layer: the most undervalued role in enterprise AI

The translation layer is not a role. It is an architectural artifact. Most enterprise AI deployments fail because they hire the role and never produce the artifact. The model speaks one language: probabilities, embeddings, token-level outputs. The operation speaks another: GL accounts, approval routes, exception logic, customer-specific rules that have lived in someone's head for fifteen years. The translation layer is the explicit, written, version-controlled artifact that converts one into the other.

Read post

Get Started

From Reading to Doing.

Every Best Practicify engagement begins with a 45-minute advisory session — a direct conversation with the practitioner who will lead the work, with enough information at the end to make a sound decision about whether the next step is a proposal, an RFP, or something else.