The AI Software Foundry is not a theory exercise.
Each principle must survive contact with real systems.
This section documents practical experiments designed to test the model under controlled conditions. Each experiment explores a specific dimension of the Foundry:
Deterministic generation
Measurement stability
Oracle completeness
Regeneration discipline
Gauge calibration
Failures are recorded. Recalibrations are documented.
The goal is not to prove the model correct — but to stress it.
To determine whether AI-generated test suites can be industrialized under deterministic process control using structured metadata as the sole input.
This experiment explored whether a strictly deterministic pipeline could replace intuition-based prompt generation when producing verification logic (Supertest suite) from an OpenAPI specification.
Build a deterministic program that scans an OpenAPI specification.
Generate a test suite using only structured schema metadata.
Exclude all prose-based descriptions from the pipeline.
Measure output stability and repeatability across runs.
The system achieved full repeatability.
Identical input produced identical test artifacts.
The Gauge reported 41/41 passing tests.
The process appeared stable.
A critical business rule existed only in prose:
Tasks must be returned newest first.
Members may access only their own tasks.
Because these constraints were not encoded in machine-readable metadata, the deterministic generator ignored them.
The resulting test suite was structurally correct but logically incomplete.
The system passed every test.
The requirement was still violated.
This was the “Green Lie” — a calibrated but misaligned Gauge.
The solution was not to patch the test suite.
Nor was it to reintroduce intuitive LLM reasoning.
Instead, the Oracle was upgraded:
Prose was converted into structured metadata.
A custom ordering constraint was introduced.
Access rules were encoded as explicit constraints.
When the deterministic generator consumed the upgraded Oracle, it produced a higher-resolution measurement automatically.
The test suite now verified sort order explicitly.
No change was made to the artifact logic.
Only the Oracle evolved.
Deterministic generation was preserved.
Repeatability was maintained.
The measurement system improved in precision.
The defect was eliminated at the process level.
The experiment demonstrated:
Determinism without structured metadata leads to blind spots.
Measurement stability is meaningless without specification completeness.
Prose cannot function as logic in a deterministic pipeline.
Metadata is the new logic of the system.
This experiment exercised:
Design for Verification — converting prose to constraints.
Measurement — deterministic Gauge generation.
Regeneration — upgrading the Oracle instead of patching artifacts.
The Foundry Model — Oracle → Generator → Gauge alignment.