The Reliability Paradox.

Why "Repeatable" AI Isn't Always "Right"

The Trade-off: Intuition vs. Determinism

When we treat an AI as a "Craftsman," we rely on its high-level reasoning to fill in the blanks. A loose, prose-based prompt is "smart" enough to read a requirement like "Tasks are returned newest first" and intuitively build a test for it.

But intuition is non-deterministic. It’s what Deming would call "Special Cause Variation." To achieve the Statistical Process Control (SPC) required for industrial-grade software, we recently shifted our research toward a rigid, metadata-driven pipeline. We traded the AI’s "guessing power" for a repeatable process.

Then, the system went blind.

The "Green Lie" of the Uncalibrated Gauge

In our quest for repeatability, we built a deterministic program designed to scan an OpenAPI spec and generate a test suite. Because the process was now strictly deterministic, the program only used what was explicitly defined in the metadata.

Our business rules were still trapped in plain prose:

"Tasks are returned newest first. Members can only access their own tasks."

The program followed its instructions perfectly. It ignored the "Description" field (unstructured noise) and focused exclusively on the "Schema" (structured signal). It produced a structurally perfect test suite that passed 41 out of 41 tests. However, the sorting requirement was not respected.

The Recalibration: Upgrading the Oracle

The fix wasn't to update the test suite directly, nor was it to simply "ask the LLM" to try harder. Neither was it a retreat to "fuzzy" prompts in the pipeline. All of those moves would be a return to manual craftsmanship and "inspection-based" quality. Instead, we applied Deming’s Third Point: we built quality into the process by upgrading the Oracle (The Master Die). We replaced the prose with machine-readable metadata, moving the requirement from a descriptive sentence to a custom x-ordering constraint.

The Result: The moment the deterministic program saw the structured metadata, it produced a high-resolution measurement (aka test suite). It stopped checking "if" an array was returned and started verifying the required sort order.

The Lesson: Metadata is the New Logic

The move to an AI Foundry requires a hard realization: If you want a deterministic, repeatable process, you cannot rely on AI intuition. In an industrial model, "Prose" is scrap.

In an AI Foundry, requirements cannot live in prose. If the generator is to produce deterministic artifacts, the behavior of the system must be expressed as machine-readable constraints. In this model, metadata becomes the new logic of the system.