The Dark Factory Model for AI-Driven Software Development

April 5, 2026•6 min read

Most engineers treat AI-generated code like work from a junior developer they don't trust. Simon Willison gave me a better mental model: the dark factory. Here is what it means, why experience is the raw material, and how to build a system that runs.

GenAI agentic-engineering software-development engineering-leadership ai-assisted-programming

The Dark Factory Model for AI-Driven Software Development

Most engineers still treat AI-generated code like work from a junior developer they do not fully trust. They read every line. They second-guess every pattern. They turn a productivity tool into a bottleneck.

That is the wrong mental model. Simon Willison at simonwillison.net provides a better framework: the dark factory.

Simon also made an observation that landed with particular force. In his work on Agentic Engineering, he wrote that AI tools amplify existing expertise. The more experience you have as a software engineer, the faster and better your results with coding agents.

I have been writing software for more than 30 years. That reframed everything.

What a dark factory actually is

A dark factory, also called a lights-out factory, is a manufacturing facility that runs without humans on the floor. No workers. No lights needed. Robots operate from precise engineering blueprints, tight tolerances, and automated quality gates. Humans designed the system. The system builds the product.

Real dark factories took decades and serious capital investment to get right. Engineers designed every constraint and mapped every failure mode. They built statistical process controls into each step, not just the final inspection. The destination looks clean. The build cost is real.

The same is true here. The model I am describing is not a shortcut. It is a discipline.

Experience is the raw material

Simon is right. But the phrase "amplify existing expertise" deserves more precision. Vague inspiration does not help anyone act differently.

What 30 years of experience actually gives you is a library of failure modes. I know which patterns look elegant but collapse under production load. I know where security assumptions get made silently and incorrectly. I understand idempotency guarantees across every layer of a distributed system. I can define that requirement so precisely that the AI has no room for misinterpretation.

That specificity makes the model work. The AI does not guess what I mean. The interpretation gap closes because the input is precise. That precision draws on decades of accumulated judgment.

This raises a fair question for engineering leaders: Does every engineer on your team need 30 years to participate? No. But they need enough. Enough pattern literacy to write a spec that constrains the AI. Enough honesty to recognize their own blind spots.

The spec is the factory floor

Before a line of code gets generated on my teams, we write. We produce specs, architecture decisions, expected patterns, and testing requirements. We establish a clear definition of done.

The plan is not overhead. It is the factory floor.

Two things matter here that manufacturing engineers would recognize immediately.

First, the spec must stay within the system's capability envelope. Identifying the boundaries of that envelope is a core leadership responsibility. If you ask the AI to solve a problem that is genuinely novel or architecturally ambiguous, defect rates spike. Know what the system executes reliably and design your specs accordingly.

Second, process control does not live only at the end. It runs through every checkpoint. This includes linting, type checking, unit tests, integration tests, and security scanning. The pipeline is the equivalent of statistical process control embedded at every step. Final testing confirms. The pipeline validates continuously.

What you still review, and why

I do not do line-by-line code review as my primary quality mechanism. That is a different statement from "I never look at the code."

There are surfaces I always examine regardless of test results. These include security boundaries, authentication and authorization flows, data models, and API contracts. These are the places where an LLM can produce code that passes every test while being wrong in a way you did not think to test for.

LLMs are stochastic, not deterministic. The same spec can produce different outputs across runs. A robot welder does not do that. That non-determinism is manageable, but it requires a testing strategy that goes beyond happy-path coverage. Adversarial testing and edge case enumeration are required layers.

Simon's red/green TDD pattern is worth adopting explicitly. Write the tests before generation starts. Confirm they fail as expected. Then let the agent implement to green. This makes acceptance criteria structural rather than aspirational.

The retooling problem

Dark factories retool slowly. Reconfiguring a physical production line is expensive. Software requirements change fast, often mid-sprint.

When the spec changes significantly, the factory does not adapt automatically. The guardrails and test suites may no longer fit the new requirements. Restarting generation without updating the spec is where the model breaks down most predictably.

Treat the spec as a living document. Apply the same rigor you would to production code. Version it. Review it. When the spec drifts, the factory output fails.

Who responds when the factory jams

Dark factories have humans on call. Sensors surface anomalies. Exceptions route to people who can diagnose and intervene. The lights are off, but the dashboards are on.

Your equivalent is observability and clear ownership of the pipeline. Someone must be accountable when a generated component fails in a way the tests did not catch. Name that person before the factory runs. Do not wait until it breaks.

Guardrails also degrade over time. Standards erode as teams turn over. Build an audit loop into your process. Periodically review whether generated code still conforms to the architecture. The factory needs maintenance even when it runs well.

The shift is real

AI coding tools are not fancy autocomplete. They are a production system that requires design, constraints, and continuous monitoring.

If you have the experience, this moment is an extraordinary amplifier. Your decades of pattern knowledge and failure memory flow directly into the guardrails that make the factory run.

But the amplification multiplies what is already there. You must have something worth amplifying.

Write the spec. Define the guardrails. Test the edges. Audit the drift. Stay upstream.

The factory runs. You just have to build it right.

Hat tip to Simon Willison for the dark factory framing and the Agentic Engineering work. Read him at simonwillison.net.

Share this post

April 4, 2026

Open Weights, Real Stakes: Running Gemma 4 31B Locally

Open Weights, Real Stakes: Running Gemma 4 31B Locally Google dropped Gemma 4 yesterday. I had it running locally by last night. Here's what it actually looks like to pull a frontier-class open model...

GenAIopen-source

March 28, 2026

From Copy-Paste to Skill: What Thousands of AI Coding Sessions Taught Me About Guardrails

After thousands of sessions with Claude Code, Codex, Kiro, and every other LLM-based CLI and IDE, I distilled what I learned into a reusable Claude Skill. Here's how those lessons became the guardrails that let me move faster and actually trust the output.

claude-codegenai

November 15, 2025

My Journey to Trusting GenAI With More of My Code

How my workflow evolved from autocomplete magic to spec driven development, and why trusting GenAI with more of my code has changed how I build software.

genaiengineering

Back to All Posts