Code Review Your Prompts

The thread promised "a hidden Claude feature called Burnout Recovery Protocol Mode" that would reset cortisol baseline and recover six months of exhaustion in 14 days. Seven prompts, each with role definitions and methodology sections. Each citing real researchers. Hundreds of thousands of likes. Engineering leaders sharing it.

None of it would survive a code review.

We apply rigorous evaluation to architectural decisions, library choices, and security reviews. We catch memory leaks in 30 seconds and demand load tests before we trust a dependency. Then we copy-paste a "scientifically backed protocol" from a stranger on the internet and run it on our lives. Or worse, we ship that same uncritical pattern into the prompts powering our agentic SDLC pipelines.

Prompts deserve the same scrutiny as code. Here is a framework for applying it.

A note before diving in: this is model-agnostic. The framework works for Claude, GPT, Gemini, Llama, or any system prompt you find in a GitHub repo. The failure modes are universal because they come from how prompts are written, not which model runs them.

The pattern

Before the framework, the pattern. Viral prompts circulating on Twitter, LinkedIn, Medium, and Facebook share a remarkably consistent set of failure modes. Once you see them, you cannot unsee them.

Pseudoscience dressed in neuroscience vocabulary. "Dopamine baseline compression," "cortisol reset," "HPA axis dysregulation." These terms map to real concepts in their original clinical context, then get stretched into folk science. The dopamine detox concept, for instance, has been thoroughly debunked by working neuroscientists. Dopamine is a neurotransmitter involved in motivation and prediction error, not a pleasure currency that depletes and refills. You cannot "reset" it in 14 days because that framing does not describe how the system works.

Name-dropping researchers with no real link to the science. Anna Lembke, Andrew Huberman, Matthew Walker, Stephen Porges. Real people doing real work, cited as if their endorsement is implicit. Spot-check the actual research and the gap shows up fast. Walker's Why We Sleep drew a detailed critique from Alexey Guzey flagging factual errors, with Columbia statistician Andrew Gelman calling one issue a potential "smoking gun" for research misconduct. Porges's polyvagal theory was declared "untenable" in February 2026 by Paul Grossman and 38 co-signatories in Clinical Neuropsychiatry, arguing core premises are not defensible based on existing neurophysiological evidence. Huberman has drawn pointed criticism from working scientists for cherry-picking weak studies and extrapolating animal research to human prescriptions. Citing a researcher does not transfer their rigor. It borrows their credibility.

Invented timelines. "14 days to reset." "21 days to rewire." "Six weeks to mastery." These numbers feel concrete. They are almost always fabricated. The "21 days to form a habit" myth traces to a 1960 plastic surgeon's observation about amputation recovery, then got laundered through self-help books. Actual habit formation research (Lally et al., 2010) found a median of 66 days with wide variance. Burnout recovery, when severe, takes months to a year, not 14 days.

Fake credentialing. "Behavioral scientist who has worked with 1,000-plus high-performers." "Sustainable performance designer who has worked with athletes, founders, and surgeons." These are roleplay instructions to a language model. The credentials exist only in the prompt itself. The author is not a behavioral scientist. The model is not either. The credential is theater.

Pop-psychology resets that sidestep the honest reality. Changing habits, behaviors, and identity patterns takes work, commitment, and time. The viral prompt economy sells the opposite. Two weeks. One protocol. Hidden mode unlocked. The framing is appealing because the real answer is hard, and the hard answer does not generate engagement.

If a prompt promises a fast reset to a hard problem, it's selling you something. The result is not what you're buying.

The framework

Four questions. Apply them to any prompt before you run it.

1. What is the underlying claim?

Strip the role definition, the methodology bullets, and the output format. What is this prompt actually asserting will happen? Often the answer is something the prompt would never state plainly, because stated plainly it would sound absurd. "Run these seven prompts and recover six months of burnout in 14 days" is the actual claim of the viral thread. Written that way, it fails the smell test immediately.

The framing exists to keep you from seeing the claim. Make yourself see it.

2. Does evidence support it?

Spot-check one citation. If the prompt invokes a researcher, does that researcher actually say what is being implied? Open a tab. Read an abstract. Search for critiques of the work, not just summaries that agree with it.

This takes 90 seconds. Almost nobody does it. That is the entire business model of the viral prompt economy.

3. What is framing versus technique?

This is the most important question, and the one that generalizes furthest. Separate the costume from the substance. Role definitions ("You are an executive coach with 15 years of experience"), dramatic context ("My baseline pleasure response is compressed"), and methodology theatrics ("Front-load week one with nervous system regulation") are framing. The actual instructions to the model are the technique.

The pattern shows up everywhere, not just in wellness content. Consider a coding prompt that starts with: "You are a 10x principal engineer with 20 years of experience at FAANG companies. You write production-grade, enterprise-scale code that scales to billions of users." Strip the costume. The actual instruction is: "Write good code." The credentialing adds nothing the model can act on. It only sells the prompt to the reader.

Most viral prompts are 80 percent framing and 20 percent technique. The framing is what sells the prompt on social media. The technique is what produces the output. Optimizing the framing without improving the technique is selling, not engineering.

4. How would you rebuild it?

This is the real test. If you cannot rewrite a prompt to be better, you do not understand why it works. If you can, you have turned a piece of content into a tool you actually own.

Rebuilding forces you to identify which parts of the original were doing real work and which parts were performance. It is the equivalent of refactoring someone else's code. You find out fast what they understood and what they were faking.

A worked example

Here is one of the viral prompts. Names and identifying details removed.

<role>
You are a behavioral psychologist trained in dopamine
homeostasis research and neuroscience protocols.
</role>

<context>
My baseline pleasure response is compressed. Normal
activities don't feel rewarding because I've over-
stimulated my dopamine system. I need a strategic
phased reduction, not a binary detox.
</context>

<methodology>
1. Ask me to list my top 7 dopamine spikes in a typical day
2. Rate each on frequency per day and dependence level 1-10
3. Design a 14-day phased reduction:
   - Days 1-3: Awareness only (track, do not change)
   - Days 4-7: Reduce frequency 50%
   - Days 8-14: Replace with low-dopamine alternatives
4. Build the replacement behavior library
</methodology>

<output_format>
- Dopamine Spike Profile (top 7, with ratings)
- Daily Reduction Calendar across 14 days
- Replacement Behavior Library
- Withdrawal Symptoms to Expect
- Day 14 Reset Signals (how to know your baseline is recovering)
</output_format>

Apply the four questions.

The claim. Stripped of framing: "Following this 14-day plan will measurably reset your dopamine system to a healthier baseline." That is a neuroscientific claim about a specific physiological change occurring in a specific timeframe.

The evidence. Dopamine does not have a "baseline" that gets "compressed" by phone use in any clinically meaningful sense. Medical News Today and the Cleveland Clinic both note no substantial evidence supports the concept. Neuroscientists interviewed by The Scientist called the framing a fundamental misunderstanding of how the system works. The 14-day window is invented. "Withdrawal symptoms" for non-substance behavior is borrowed clinical vocabulary that does not map cleanly to checking Instagram.

Framing versus technique. Strip the costume. The actual technique is: track high-stimulation habits, reduce them gradually, substitute alternatives. That is behavioral activation and habit substitution, both of which have real evidence behind them (Gollwitzer's implementation intentions, Fogg's behavior model, Lally's habit formation research). The neuroscience framing adds nothing to the technique. It only adds credibility theater.

The rebuild.

<role>
You are a behavior change coach drawing on evidence-based
frameworks: behavioral activation, Gollwitzer's implementation
intentions, and stimulus control from cognitive behavioral
therapy. You do not invoke dopamine mechanisms unless citing
specific peer-reviewed evidence.
</role>

<context>
I want to reduce compulsive engagement with high-stimulation
activities that crowd out things I value more. I'm looking
for behavioral strategies with evidence behind them, not
neuroscience claims.
</context>

<methodology>
1. Help me identify 5 to 7 compulsive habits. For each, identify:
   - The cue or trigger
   - The reward I'm actually seeking (stress relief, boredom
     escape, social connection)
   - A competing behavior that could deliver a similar reward
2. Design a plan using:
   - Week 1: Self-monitoring and trigger identification
   - Week 2: Stimulus control (environment changes) plus
     implementation intentions ("when X, I will Y")
   - Week 3: Habit substitution with the competing behaviors
3. Build in honest expectations about relapse, which is normal
   and not failure, plus specific recovery protocols.
4. Distinguish compulsive use from legitimate use of the same
   tool. Scrolling LinkedIn is different from posting on it.
</methodology>

<output_format>
- Habit audit with triggers and underlying rewards
- Environmental design changes for each habit
- If-then implementation intentions for high-risk moments
- Replacement behaviors matched to the actual reward sought
- Relapse recovery plan
- Weekly check-in questions
</output_format>

<constraints>
- Do not invoke dopamine mechanisms without specific citations
- Acknowledge uncertainty
- Recommend a licensed therapist if patterns suggest clinical
  depression, anxiety, or addiction
</constraints>

The output difference is concrete. The original prompt produces a confident 14-day calendar with phrases like "Day 7: Your dopamine receptors are beginning to upregulate" and "Day 14: You should notice your baseline recovering." The rebuild produces a habit audit that names the actual reward each behavior delivers, environmental changes to remove cues, and a relapse plan that treats setbacks as data rather than failure. The first reads like a wellness app. The second reads like a clinician's intake notes.

The rebuild is longer than the original. It does less marketing and more work. It targets the actual mechanism (habit loops) instead of imaginary neurochemistry. It builds in honest expectations about relapse. It acknowledges the limits of what an AI conversation can do.

This is what a code-reviewed prompt looks like.

From personal prompts to production systems

The same uncritical patterns are migrating into engineering workflows. The bridge between a viral wellness prompt and your CI/CD pipeline is shorter than it looks. Both depend on the same skill: distinguishing what a prompt claims from what it actually instructs the model to do.

Spec-driven development, agentic SDLC pipelines, multi-agent orchestration, and Claude Code configurations all depend on prompts that survive scrutiny. If the industry cannot evaluate a self-help prompt critically, what happens when prompts are running production pipelines?

Consider what a fake-credentialed prompt costs in production. A system prompt that opens with "You are a senior security engineer with 20 years of experience reviewing code for vulnerabilities" does nothing functional. The model is not a senior security engineer. It cannot become one through assertion. But that prompt running in a CI pipeline creates a false confidence that security review happened, when what actually happened was pattern matching against a costume. The cost is not theoretical. It is the gap between "this code was reviewed" and "this code was reviewed by something competent to catch the vulnerability that ships to production."

The viral prompt economy is training a generation of practitioners to mistake framing for technique, defer to fake credentialing, and skip evidence checks because the formatting looks rigorous. That habit does not stay confined to wellness content. It shows up in the prompts your engineers write, the agents your platform deploys, and the decisions those agents make on your behalf.

The four-question framework is for upstream prompt design. It is the skill that makes downstream evals (golden datasets, A/B comparisons, output quality scoring) useful in the first place. Evals catch defects in prompts that already exist. The framework prevents defective prompts from being written.

What changes on Monday

Three practices to adopt this week.

Audit one prompt you trust. Pick a prompt currently running in a workflow you own, whether it is a Claude Code configuration, a Cursor system prompt, a GitHub Copilot custom instruction, or an agent definition in a pipeline. Apply the four questions. You will likely find it is more framing than technique. Rewrite it.

Establish a prompt review checkpoint. Prompts that ship to production should pass through the same review surface as code that ships to production. That can be a section in your existing code review template, a dedicated prompt repository with pull request review, or a checkpoint in your spec-driven workflow. Pick one and make it real.

Slow down on first read. This framework adds friction. That is the point. A 90-second citation check is the cheapest insurance available against a class of error that will otherwise scale across every prompt you write. The friction pays back the first time it catches a bad prompt before it ships.

The four questions are the artifact. Print them. Pin them. Put them in your team's prompt template.

What is the underlying claim?
Does evidence support it?
What is framing versus technique?
How would I rebuild it?

Closing

Engineering rigor is not a domain. It is a disposition. Apply it to your prompts the way you would apply it to your code: ask what it is doing, check whether the foundation holds, separate signal from theater, and rebuild what does not work.

If a prompt promises a fast reset to a hard problem, treat it the way you would treat a pull request from someone who has never seen the codebase. Politely, carefully, and with the assumption that you are going to find something broken.

You usually will.

The bottleneck has never been the model. It has always been the thinking we bring to it.

Code Review Your Prompts

The pattern

The framework

A worked example

From personal prompts to production systems

What changes on Monday

Closing

Further reading

Everyone Knows You Never Rewrite

Trust the Gate, Not the Actor

The Kill Switch Was Always There

Code Review Your Prompts.

The pattern

The framework

A worked example

From personal prompts to production systems

What changes on Monday

Closing

Further reading

Everyone Knows You Never Rewrite

Trust the Gate, Not the Actor

The Kill Switch Was Always There

Code Review Your Prompts