On Friday afternoon, the integration fails.
Product used Jira's AI to generate the user story and a rough component spec. Design used Figma's AI to turn the wireframe into React. Engineering fed the same wireframe and ticket into a coding agent and got a different implementation. Each tool produced something plausible. None of them produced the same thing. Nobody noticed until the work converged.
That is not an edge case. It is what happens when organizations add AI capability faster than they add a framework for how that capability should be used.
We have never had more AI across the product development life cycle. Jira can generate stories. Figma can export code. Copilot can write functions. Claude Code can restructure files. Cursor can refactor an entire feature. The power is real. The problem is that every tool now wants to participate in every layer of delivery, and most teams have not decided where each one should stop.
The result is not just tool sprawl. It is context sprawl. Multiple systems generate overlapping artifacts, each with partial context, high confidence, and no shared source of truth. Teams end up with more output, but less clarity. The natural reaction is to compare tools and ask which one is best. That is the wrong question.
The real question is this: how does work move cleanly from intent to design to implementation to validation when every tool is now trying to help at every step?
That is the challenge hiding inside the AI tooling boom, and it leads to one uncomfortable truth:
AI did not create the broken handoffs between product, design, and engineering. It exposed them.
Those handoffs were already messy. We just did not move fast enough for the mess to become this visible. Now we do, and now it does.
What many organizations are building, whether they realize it or not, is an agentic software development life cycle. The question is whether they build it deliberately or inherit it by accident.
The bottleneck has inverted. Writing code is no longer what slows most teams down. Knowing what to build, and communicating it clearly, is now the expensive human problem.
The Hourglass Inversion
For most of the modern software era, implementation was the narrowest point in the pipeline. AI is changing that. As code generation becomes abundant, the constraint moves upstream. The bottleneck is no longer primarily writing software. It is defining what should be built, specifying it clearly, and handing it off cleanly across layers.
For a growing share of routine implementation work, code generation is becoming abundant. A skilled engineer with strong tools can now produce in hours what once took days or weeks. Novel architecture, complex distributed systems, and deep domain logic still require experienced judgment. That has not changed. What has changed is the amount of standard implementation work that can now be accelerated dramatically.
Once that happens, the constraint moves. The pipeline no longer slows down primarily because code is hard to produce. It slows down because the upstream layers are too vague, too fragmented, or too inconsistent to guide high-speed execution well.
That shift matters more than most leaders realize. When the requirements are incomplete, the interfaces are unclear, or the design lacks enough annotation to express intent, AI does not pause and ask for organizational maturity. It produces something anyway. That means the old cost of ambiguity, which used to show up as delay, now shows up as high-speed rework.
Teams can now ship plausible-looking garbage faster than ever.
At small scale, that produces frustration. At larger scale, it produces something more dangerous: institutional debt. AI-generated code enters the codebase faster than humans build shared understanding of why it exists, how it fits, and what assumptions shaped it. The velocity looks great this quarter. The maintenance bill arrives later, usually after the original prompt, author, or design context has disappeared.
That is why this is not just a tooling conversation. It is an operating model conversation.
Why Every Tool Is Trying to Do Everything
Before going further, it helps to be precise. Most of what vendors label as "agents" today are not autonomous agents in any rigorous sense. They are AI-assisted features operating inside a narrow context window and a constrained product boundary. That does not make them unimportant. It just means they should be designed into a pipeline according to what they can reliably do, not what a marketing page claims they can do.
The boundary problem is structural. Every vendor is under pressure to expand its role in the development life cycle before a competitor does. Jira does not want to remain a ticket system. Figma does not want to remain a design surface. Coding assistants do not want to remain line-by-line completion tools. Each product is being pushed toward end-to-end ownership.
That ambition collides directly with the upstream bottleneck. Vendors are racing into the very handoffs where organizations are already least disciplined. The result is duplicated context, inconsistent outputs, and AI features making decisions outside the domain they actually understand.
A Jira-generated implementation may fit a ticket neatly while ignoring architectural constraints. A Figma-generated component may look correct while failing your design system, accessibility standards, or state-management patterns. A coding agent may generate reasonable code while knowing nothing about the compliance boundaries or deployment conditions that shaped the requirement.
The issue is not that these tools are bad. It is that their context is narrow and their confidence is high. That is a risky combination.
No single tool is going to become the operating model for your product delivery pipeline. The organizations that navigate this well will not solve it by betting on one vendor to do everything. They will solve it by defining which layer owns which artifact, which tool is canonical at that layer, and how that artifact moves downstream.
A Better Mental Model: The Intent Stack
The most useful reframe is to stop organizing around tools and start organizing around layers of intent.
AI works well within a layer when the job is to expand, translate, or accelerate work that is already bounded. It is much less reliable when it has to cross layers without structured input. The safest and most scalable pattern is to treat the development life cycle as a stack of intent, where each layer has a clear responsibility, a clear output, and a clear owner.
The Intent Stack
| Layer | Responsibility | Primary Output | Owner |
|---|---|---|---|
| 1. Vision & Strategy | Business goals, user problems, desired outcomes | Strategic Brief | Product Leadership |
| 2. Requirements | User stories, flows, contracts, domain rules | Context Spine | Product + Engineering |
| 3. Design | Interaction patterns, component behavior, annotated decisions | Annotated Design | Design + Engineering |
| 4. Implementation | Code generation, testing, refactoring, integration work | Reviewed and Tested Code | Engineering |
| 5. Validation | Integration checks, performance, security, policy gates | Verification Report | Engineering |
| 6. Feedback | User behavior, outcomes, production signals | Insight Loop | Product + Engineering |
One important point this makes explicit is that the upstream layers cannot be treated as fuzzy pre-work. Product and design do not simply hand engineering a loose direction and wait for the code to clarify it. In an AI-accelerated system, that ambiguity becomes expensive much earlier. If requirements and design are underspecified, the implementation layer will move quickly in the wrong direction.
That is why the stack needs more than ownership. It needs clean handoffs.
The Context Spine
Every layer transition needs an artifact the next layer can consume without guessing.
That artifact is what I call the Context Spine. It is a lightweight, living specification that carries intent through the pipeline. It is not a giant requirements document. It is not a project-management form. It is the minimum structured context needed to let downstream humans and machines act without inventing missing meaning.
Most organizations do not have this. They have a ticket with a paragraph, a design file with sparse annotations, and a lot of assumptions hiding in meetings, chat threads, or somebody's head. In a slower system, that can limp along. In a faster one, it fails noisily.
A useful Context Spine can often be built from five fields:
- problem statement
- acceptance criteria
- explicit constraints
- interfaces
- non-goals
That is enough to capture the shape of the work without dictating implementation.
Example: Context Spine for a Notification Preferences API
---
id: CSPINE-001
title: User Notification Preferences API
owner: product-engineering
status: draft
version: 1.0
linked_design: figma://notification-preferences
linked_schema: /schemas/notification-preferences.v1.json
acceptance_test_suite: /tests/contracts/notification-preferences
last_updated: 2026-04-15
---
## Problem Statement
Authenticated users need to control which transactional and marketing
notifications they receive across email, SMS, and push channels. Current
behavior silently defaults all users to opted in, creating compliance risk
and avoidable customer-service volume.
## Acceptance Criteria
- Authenticated users can retrieve current preferences across all channels
- Users can update one or more channels in a single request
- Changes propagate to downstream senders within 60 seconds
- All changes are written to the audit log with actor, timestamp, and prior value
## Constraints
- Security: Valid session token required; no service-to-service access in v1
- Compliance: Must support opt-out requirements under applicable regulations
- Performance: p95 read latency under 100 ms; write latency under 250 ms
- Data: Preferences are user-scoped with no cross-account visibility
## Interfaces
- GET /v1/users/{user_id}/notification-preferences
- PATCH /v1/users/{user_id}/notification-preferences
- Output conforms to NotificationPreferences schema
- Emits preference.changed event to notification bus
## Non-Goals
- Admin override workflows
- Preference inheritance across linked accounts
- Frequency caps by channel
- Migration of previously opted-in users
Hat tip to Dan Greller for the frontmatter YAML idea. It sharpens the point that the Context Spine is not just a better-written spec, it is a real pipeline artifact.
The YAML at the top is not decoration. It is part of what makes the artifact useful to the pipeline. That metadata gives the spine an identity, an owner, a version, and traceable links to adjacent assets. It turns the document from a better-written spec into a real control point in the system.
Notice what the rest of the spine does. It does not prescribe framework, language, database, or deployment target. It defines the problem with enough precision that a capable engineer, or a code-generating system, can produce a correct implementation without guessing at the business intent.
The non-goals are especially important. That is where teams prevent an eager engineer, or an eager model, from helpfully building the thing nobody actually asked for.
The Context Spine also matters for a more technical reason. Model outputs are non-deterministic. The same prompt can produce materially different implementations across runs. The only durable way to manage that variance is to make the governing intent precise enough that deviations are easy to detect and reject. In regulated environments, that is table stakes. In every other environment, it is quickly becoming good engineering hygiene.
What Humans Must Still Own
This kind of pipeline does not remove the need for people. It raises the value of the work only people can do well.
Humans still need to own problem framing. A model can elaborate on a requirement, but it cannot decide whether the requirement reflects the right business tradeoff, the right customer need, or the right sequencing of value.
Humans still need to own boundary decisions. Someone has to decide where flexibility is acceptable, where risk is not, and when an exception deserves judgment instead of automation.
Humans still need to own approval at transitions. The most important reviews in an AI-enabled pipeline are not line-by-line code rituals. They are moments where someone asks whether the output still matches the intent, whether the design still fits the requirement, and whether the implementation still respects the constraints.
Most importantly, humans still own accountability. Models do not carry responsibility for customer harm, compliance failure, degraded resilience, or rising maintenance burden. Leaders and engineers do.
The right goal is not to keep humans in every operation. It is to place them at the decisions that actually require judgment.
The Governance Dividend
This is where the Context Spine stops being just a productivity artifact and becomes a governance asset.
When an internal auditor, regulator, or risk partner asks why a change exists, who approved it, and how the implementation traces back to the stated requirement, most organizations are still answering from memory, scattered artifacts, and reconstructed intent. That is manageable when software moves slowly. It becomes much harder when generated output increases the pace of change.
A disciplined Context Spine creates a durable chain from intent to implementation. It makes the reasoning behind the code visible even after the sprint ends, the engineer moves on, or the toolset changes. Regulated industries feel this first, but the pattern is broader. Once generated code exceeds shared human understanding, every organization inherits maintainability and accountability risk.
This is one reason I believe the next phase of software delivery maturity will not be defined by which coding assistant an organization bought. It will be defined by whether the organization built an auditable, comprehensible path from intent to output.
That is also why velocity alone is the wrong metric. Leaders should care about how quickly code appears, but they should care even more about whether that code aligns with the intent that justified it.
A practical metric here is Spec-to-Code Alignment. One simple proxy is to have teams track the percentage of generated code that required structural rework after review, meaning changes to logic, interfaces, data models, or integration behavior, not style cleanup. Even a rough weekly number starts to reveal whether the pipeline is getting clearer or noisier.
Six Principles for Getting This Right
1. Spec Before Speed
Once code becomes abundant, vague inputs become more expensive. A weak spec is no longer a slow path to confusion. It is a fast one. Teams should resist the urge to celebrate generation speed before they improve the quality of the upstream artifacts guiding it.
2. Declare Canonical Tools at Each Layer
Most organizations will not stop overlapping AI features from appearing across Jira, Figma, IDEs, and other platforms. That is not the realistic battle. The realistic discipline is deciding which output is canonical at each layer and explicitly treating the rest as secondary. If Figma owns the design artifact, a Jira-generated UI suggestion is advisory. If the repo-resident spec is authoritative, a chat-generated summary does not get to overrule it.
3. Produce Consumable Artifacts
Every layer should emit something structured enough that the next layer can act on it without interpreting vibes. Markdown, JSON, OpenAPI, schemas, annotated designs, and testable acceptance criteria all work. Hand-wavy prose and tribal knowledge do not.
4. Own the Context Spine
The most valuable asset in an AI-enabled pipeline is not the model. It is the artifact that tells the model, and the humans using it, what good looks like. Treat the Context Spine as a first-class asset in the repo, version it with the work it governs, and make drift visible.
5. Checkpoint Transitions, Not Every Operation
Human review belongs at layer transitions. The pipeline should handle repeatable mechanical checks such as tests, policy enforcement, security scanning, and dependency analysis. Human attention should concentrate on the places where intent can drift, assumptions can creep in, and business meaning can be lost.
6. Close the Feedback Loop
Shipping is no longer the hard part. Learning is. Production signals should feed back into requirements and strategy, not disappear into dashboards nobody uses. If teams generate more software without improving how they learn from real usage, they are not building an agentic SDLC. They are building a faster waterfall.
Three Failure Modes to Avoid
The first failure mode is letting every layer generate its own version of the truth. Once product, design, and engineering each ask different tools to elaborate the same problem independently, divergence becomes almost guaranteed.
The second failure mode is treating prompts as the operating model. Prompts are useful, but they are not durable artifacts. They do not replace shared specifications, stable interfaces, or version-controlled intent.
The third failure mode is measuring output speed without measuring downstream rework. A team that ships twice as many lines of code while also doubling integration churn is not actually moving faster. It is just moving the cost somewhere less visible.
Why This Is Hard
This is not just a tooling shift. It is organizational change.
Engineers have legitimate opinions about craft, control, and code quality. Designers have valid concerns about reducing their work to machine-ready instructions. Product managers are often asked to be more precise without getting more time or better frameworks to do it. Vendors are pushing broader AI surfaces into tools that were never designed to act as system-level coordinators.
That makes this work messy. It touches process, identity, incentives, and habits. It asks teams to become more explicit at exactly the moment when the tooling makes it easy to fake clarity.
That is why the right move is not to redesign everything at once. Start with one team, one product path, and one end-to-end flow. Prove that better boundaries, better artifacts, and better measurements reduce rework. Then expand from evidence instead of enthusiasm.
How to Implement This Without Reorganizing Everything
Start small and make the experiment real.
Choose one product path that runs from requirement to production, ideally a feature with clear customer value and manageable cross-functional scope. Do not start with the most politically charged or technically exotic initiative in the portfolio. Pick something representative enough to matter and contained enough to learn from.
Define canonical tools and artifacts at each layer. Decide what is authoritative for requirements, what is authoritative for design, what governs implementation, and what validates release. Write those decisions down. Ambiguity here is the seed of drift later.
Create a minimal Context Spine template and store it with the work. Do not make it heavyweight. Keep it short enough that a PM, designer, and engineer can review it together and notice what is missing.
Add review checkpoints at transitions, not everywhere. Review the handoff from requirement to design, design to implementation, and implementation to release. Let the pipeline do the repetitive gatekeeping. Reserve human attention for moments where meaning can be lost.
Measure one thing for the first four to six weeks: Spec-to-Code Alignment. Watch how often generated output requires structural correction. That single signal will tell you far more about your actual maturity than vanity metrics about suggestion counts or tokens consumed.
Only expand after you have evidence. If the pilot reduces rework, improves clarity, and makes review easier, then scale the pattern. If it does not, fix the handoffs before you buy more AI.
The Path Forward
The organizations that win over the next five years will not be the ones with the most AI tools. They will be the ones that built a coherent pipeline where every layer has a clear owner, every tool has a clear role, every artifact has a clear purpose, and every line of generated code traces back to a reason it exists.
That last point matters more than many leaders appreciate. The real long-term risk is not that AI will generate bad code occasionally. Engineers have always produced bad code occasionally. The risk is that teams will accumulate large volumes of code that nobody fully understands, tied to intent that was never captured clearly enough to survive time, scale, and turnover.
That is not a problem a better IDE will solve. It is a problem of architecture, discipline, and operating model.
The agentic SDLC is not a product you buy. It is a discipline you build.
The tools are here. The capability is real. What most organizations still lack is the architecture that makes those tools add up to something coherent.
The bottleneck moved. The pipeline needs to move with it.
