AI Coding Maturity Framework

From Autocomplete
to Dark Factory

A 6-level maturity model for how software teams operationalize AI — moving from keystroke assistance to an autonomous software factory.

L00

L11

L22

L33

L44

L55

The Big Picture

What This Framework Measures

The levels describe who does the work (human vs AI) and where human attention goes (code vs specs vs outcomes).

✍

Not About Tools

This isn't about which AI model or brand you use. It's about your engineering workflow and control systems.

⚖

Autonomy + Control

Higher levels require stronger controls. More AI autonomy without better gates is a risk, not progress.

⚙

System Design

The "dark factory" is an engineered production system — specs, evaluation, CI/CD gates, and simulation — not just better prompts.

0

Level 0

"Spicy Autocomplete"

AI suggests the next line; the human accepts or rejects. A faster tab key.

Human is still the author of all logic, structure, and design
AI impact is local — small speedups, minor convenience
No change to SDLC controls (same PRs, same tests, same reviews)

LOW RISK

1

Level 1

"Coding Intern"

Human hands AI a discrete, well-scoped task. Human still handles architecture and integration.

AI is delegated tasks; human remains system integrator
Review is code-centric — human reads everything that comes back
Architecture decisions are explicitly human-owned

LOW RISK

2

Level 2

"Junior Developer"

AI handles multi-file changes and navigates codebases, but humans still read all the code.

Big throughput lift — if the human can review quickly
Bottleneck shifts to code review and context management
Many "AI-native" teams are actually here

MEDIUM RISK

3

Level 3

"Developer as Manager"

The relationship flips — you direct the AI and review at the feature/PR level. AI submits PRs for review.

Human becomes a portfolio manager of changes
AI routinely authors complete PRs end-to-end
Guardrails emerge: linting, test gates, policy checks

MEDIUM RISK — most teams top out here today

4

Level 4

"Developer as Product Manager"

Write a spec, leave, come back and check if tests pass. Code becomes a black box — you evaluate outcomes.

Unit of work becomes spec quality and evaluation completeness
Specs are first-class artifacts (versioned, reviewed, standardized)
Requires strong test/eval regime — "passes" must correlate with correctness

HIGH DEPENDENCY ON CONTROL SYSTEMS

Level 5

The "Dark Factory" — How It Works

A lights-out pipeline: humans own the what, machines own the how.

✍️ Write Spec HUMAN

➔

🤖 AI Agents Build MACHINE

➔

🌐 Digital Twin Testing MACHINE

➔

🔒 Holdout Eval HIDDEN FROM AI

➔

✅ Auto Gates & Ship MACHINE

➔

📈 Review Outcomes HUMAN

Key safeguard: the "Holdout Eval" scenarios live outside the codebase — the AI never sees the test criteria, so it cannot game them.
Digital Twins simulate real services (Okta, Jira, Slack) so AI agents can run full integration tests without touching production.

Assessment

Maturity Is 3 Dimensions, Not 1

The Levels 0–5 we just covered measure one thing: how much autonomy you give the AI. But autonomy alone isn't maturity — you also need to score two more axes, each on the same 0–5 scale.

We call these three axes A (Autonomy), C (Controls), and G (Governance) — all scored 0–5, same scale as the Levels.

A Autonomy
the Levels 0–5 you just saw

A3–A4 (Level 3–4)

C Controls
tests, CI/CD, eval gates

C2–C3 (Level 2–3)

G Governance
audit trails, policy, risk

G2 (Level 2)

The danger zone: a team at A3 but only C1 — high AI autonomy with weak testing.
Dark Factory = A5 + C5 + G5 — Level 5 on all three axes.

Strategy

Tiered Target by Risk Class

Don't race to Level 5 everywhere. Match your A (autonomy) and C (controls) targets to system criticality.

Tier	Systems	Autonomy (A)	Controls (C)
Tier 1 — High Risk	Regulated, money-moving, identity & access	A2–A3	C3–C4 (very strong)
Tier 2 — Medium Risk	Internal platforms, data pipelines, ops tooling	A3–A4	C4
Tier 3 — Low Risk	Front-ends, prototypes, internal productivity apps	A4–A5	Proving ground for scenarios & twins

Roadmap

Enterprise Adoption in 4 Phases

A practical migration path from standardized AI-assisted development to autonomous delivery.

1

Standardize L2–3

Approved toolchain, spec templates, AI-assisted PR generation, baseline telemetry.

2

Build Control Plane

External scenario suites (holdout), strong CI/CD gates, initial digital twins.

3

Move to L4

Spec-driven, outcome-based review. Code reading becomes exception-based.

4

Dark Factory (L5)

Only where justified — high change volume, low risk, proven scenario + twin maturity.

The Bottom Line

One Sentence for Decision Makers

"Maturity is not how much code AI writes.

It's how confidently you can ship without reading code — which depends on external scenario evaluations and safe simulation environments."

From Autocompleteto Dark Factory