Specification-Centric AI Development (SCD)

The “Fix the Factory” Paradigm

Process Improvement of Prompt Generators for Deterministic AI Code Production

Version: 1.1 (Research Draft)
Date: [TBD]
Author: [TBD]

1. Executive Summary

Most AI-assisted development today follows an implicit pattern:

Generate code.
Discover defect.
Patch the code.
Move on.

This paper proposes a different model:

Do not fix the product. Fix the factory.

Specification-Centric Development (SCD) treats AI not as an autonomous agent, but as a deterministic compiler whose behavior is governed by explicit prompt generators.

When a defect is discovered:

The implementation is deleted.
A diagnostic AI classifies the root cause.
The specification or prompt generator is amended.
The entire system is regenerated.

No manual coding is permitted.

The hypothesis:

Software quality in AI-driven systems is a function of specification and generator clarity, not iterative patching.

This research evaluates whether structured prompt generators can be improved through a formal process improvement loop, resulting in measurable reduction in defect recurrence.

2. The Problem: The Auditability Gap in AI Development

AI coding workflows suffer from three structural weaknesses:

Opaque reasoning — agent decisions are internal.
Code patching bias — developers fix outputs instead of causes.
Lack of versioned instruction discipline — prompts evolve informally.

When a defect occurs, the natural tendency is to adjust the code.

This creates:

Non-reproducible changes
Hidden instruction drift
No systematic improvement of the generation mechanism

SCD reverses this logic.

The generator, not the code, becomes the unit of improvement.

3. Research Objective

This research aims to determine:

Can prompt generators be treated as production systems subject to process improvement?
Can defects be consistently classified as instruction gaps?
Does amending generators reduce recurrence across regenerations?
Can AI diagnose and amend its own generation instructions?

The focus is not on AI creativity, but on AI repeatability.

4. Experimental Constraint

To isolate variables, the experiment is constrained to:

Stack: Node.js + Express
Persistence: SQLite
API Style: REST
Validation: Supertest integration tests

These constraints:

Reduce architectural variance
Allow deterministic test validation
Focus the research on specification clarity

This is not a limitation of the methodology, but an experimental boundary.

5. The Multi-Layer Specification Model

SCD divides instruction into three explicit layers:

Layer

Type

Purpose

Essential Functional Requirements

Domain

What the system must do

Domain Quirks / NFRs

Domain

Safety, ordering, privacy nuances

Architectural Standards

Technical

Stack-specific implementation constraints

Each layer is versioned and externalized.

No implementation code is considered authoritative.

6. Prompt Generators as Factories

Rather than writing a single prompt, SCD uses structured generators:

Requirements Generator
Project Setup Generator
OpenAPI Generator
Database Generator
Test Suite Generator
Diagnostic Generator

Each generator:

Is versioned
Is deterministic
Is externalized
Produces artifacts from inputs

The generators collectively constitute the factory.

7. The Diagnostic Generator (The “Coroner”)

The Diagnostic Generator is the core research instrument.

Inputs:

Functional Requirements
NFR / Quirks
Architectural Standards
Generator versions
Generated artifacts
Failing test
Failure log

Process:

Map failing test to requirement rule.
Identify violated constraint.
Classify defect:
- R1 — Missing Functional Requirement
- R2 — Ambiguous Requirement
- Q1 — Missing Domain Quirk
- S1 — Missing Architectural Standard
- G1 — Generator Omission
- G2 — Generator Misinterpretation
Propose amendment to:
- Specification layer, or
- Generator instruction
Prohibit direct code modification.

This enforces factory-level correction.

8. The Regeneration Loop

The SCD loop operates as follows:

Generate full codebase from generators.
Run Supertest suite.
If failure:
- Invoke Diagnostic Generator.
- Classify defect.
- Amend appropriate instruction layer.
- Delete generated code.
- Regenerate entire system.
Repeat until 100% test pass.

No manual code edits are allowed.

9. Data Collection

Each defect is logged:

Defect ID
Generator versions
Requirement version
Classification
Root cause explanation
Amendment applied
Regeneration result
Recurrence status

TBD:

% defects by layer
Mean regeneration cycles to green
Recurrence rate after amendment
Time comparison: spec fix vs manual patch

This data forms the empirical backbone of the study.

10. Hypothesis

Primary Hypothesis:

In AI-driven backend generation, most defects are instruction gaps, not stochastic generation errors.

Secondary Hypothesis:

Process improvement of prompt generators reduces recurrence of defect classes over successive regenerations.

11. Deliberate Non-Goals

This research does not attempt to:

Eliminate regeneration cost.
Optimize generation speed.
Compare models (GPT vs Claude vs others).
Replace traditional software engineering.

The focus is narrow:

Can structured prompt generators be hardened like production systems?

12. Contribution to the AI Landscape

Current AI discourse focuses on:

Agent autonomy
Multi-step reasoning
Tool use

This research shifts attention to:

Deterministic instruction design
Versioned specification discipline
Factory-level improvement

If successful, SCD offers:

Auditability
Reproducibility
Clear defect provenance
Structured AI governance

13. Current Status

Implemented:

Requirements Generator
Diagnostic Generator (v1)
Structured defect classification
Controlled backend domain

In Progress:

OpenAPI Generator
Database Generator
Test Suite Generator
Empirical defect logging

TBD:

Dataset size
Statistical outcomes
Recurrence analysis
Comparative studies

14. Conclusion

SCD reframes AI coding from:

“Generate and patch”

to:

“Specify, generate, diagnose, regenerate.”

The research does not ask whether AI can write code.

It asks:

Can AI systems be improved by refining the instructions that generate them?

If so, prompt generators become production assets subject to continuous improvement.

And AI becomes not an agent — but a compiler.