AI Coding Maturity Framework

From Autocomplete
to Dark Factory

A 6-level maturity model for how software teams operationalize AI — moving from keystroke assistance to an autonomous software factory.

L00
L11
L22
L33
L44
L55
The Big Picture

What This Framework Measures

The levels describe who does the work (human vs AI) and where human attention goes (code vs specs vs outcomes).

Not About Tools

This isn't about which AI model or brand you use. It's about your engineering workflow and control systems.

Autonomy + Control

Higher levels require stronger controls. More AI autonomy without better gates is a risk, not progress.

System Design

The "dark factory" is an engineered production system — specs, evaluation, CI/CD gates, and simulation — not just better prompts.

0
Level 0

"Spicy Autocomplete"

AI suggests the next line; the human accepts or rejects. A faster tab key.

  • Human is still the author of all logic, structure, and design
  • AI impact is local — small speedups, minor convenience
  • No change to SDLC controls (same PRs, same tests, same reviews)
LOW RISK
1
Level 1

"Coding Intern"

Human hands AI a discrete, well-scoped task. Human still handles architecture and integration.

  • AI is delegated tasks; human remains system integrator
  • Review is code-centric — human reads everything that comes back
  • Architecture decisions are explicitly human-owned
LOW RISK
2
Level 2

"Junior Developer"

AI handles multi-file changes and navigates codebases, but humans still read all the code.

  • Big throughput lift — if the human can review quickly
  • Bottleneck shifts to code review and context management
  • Many "AI-native" teams are actually here
MEDIUM RISK
3
Level 3

"Developer as Manager"

The relationship flips — you direct the AI and review at the feature/PR level. AI submits PRs for review.

  • Human becomes a portfolio manager of changes
  • AI routinely authors complete PRs end-to-end
  • Guardrails emerge: linting, test gates, policy checks
MEDIUM RISK — most teams top out here today
4
Level 4

"Developer as Product Manager"

Write a spec, leave, come back and check if tests pass. Code becomes a black box — you evaluate outcomes.

  • Unit of work becomes spec quality and evaluation completeness
  • Specs are first-class artifacts (versioned, reviewed, standardized)
  • Requires strong test/eval regime — "passes" must correlate with correctness
HIGH DEPENDENCY ON CONTROL SYSTEMS
Level 5

The "Dark Factory" — How It Works

A lights-out pipeline: humans own the what, machines own the how.

✍️ Write Spec HUMAN
🤖 AI Agents Build MACHINE
🌐 Digital Twin Testing MACHINE
🔒 Holdout Eval HIDDEN FROM AI
Auto Gates & Ship MACHINE
📈 Review Outcomes HUMAN

Key safeguard: the "Holdout Eval" scenarios live outside the codebase — the AI never sees the test criteria, so it cannot game them.
Digital Twins simulate real services (Okta, Jira, Slack) so AI agents can run full integration tests without touching production.

Assessment

Maturity Is 3 Dimensions, Not 1

The Levels 0–5 we just covered measure one thing: how much autonomy you give the AI. But autonomy alone isn't maturity — you also need to score two more axes, each on the same 0–5 scale.

We call these three axes A (Autonomy), C (Controls), and G (Governance) — all scored 0–5, same scale as the Levels.

A  Autonomy
the Levels 0–5 you just saw
A3–A4  (Level 3–4)
C  Controls
tests, CI/CD, eval gates
C2–C3  (Level 2–3)
G  Governance
audit trails, policy, risk
G2  (Level 2)

The danger zone: a team at A3 but only C1 — high AI autonomy with weak testing.
Dark Factory = A5 + C5 + G5 — Level 5 on all three axes.

Strategy

Tiered Target by Risk Class

Don't race to Level 5 everywhere. Match your A (autonomy) and C (controls) targets to system criticality.

TierSystemsAutonomy (A)Controls (C)
Tier 1 — High Risk Regulated, money-moving, identity & access A2–A3 C3–C4 (very strong)
Tier 2 — Medium Risk Internal platforms, data pipelines, ops tooling A3–A4 C4
Tier 3 — Low Risk Front-ends, prototypes, internal productivity apps A4–A5 Proving ground for scenarios & twins
Roadmap

Enterprise Adoption in 4 Phases

A practical migration path from standardized AI-assisted development to autonomous delivery.

1
Standardize L2–3
Approved toolchain, spec templates, AI-assisted PR generation, baseline telemetry.
2
Build Control Plane
External scenario suites (holdout), strong CI/CD gates, initial digital twins.
3
Move to L4
Spec-driven, outcome-based review. Code reading becomes exception-based.
4
Dark Factory (L5)
Only where justified — high change volume, low risk, proven scenario + twin maturity.
The Bottom Line

One Sentence for Decision Makers

"Maturity is not how much code AI writes.

It's how confidently you can ship without reading code — which depends on external scenario evaluations and safe simulation environments."
arrow keys