I already had a working app. I rebuilt it anyway. Twice.
GSD, short for Get Stuff Done, started as a web app and ran that way for a couple of years. It did the one job I built it for. Then I wrapped it as a PWA so it would follow me onto my phone, and that worked too, in the way something works when you have decided not to look at it too closely. Then, knowing almost to the hour what it would cost, I tore the mobile experience down to the studs and rebuilt it as a native iOS and Mac app.
The obvious version of this story is vanity. The real one is not.
The rebuild looks like a platform story, but the platform only changed the surface. What changed the outcome was the discipline underneath. The stack made the experience possible, and the boundaries made it safe. The part I am keeping is not Swift or widgets or the warm glow of finally belonging on the phone, though I will admit the glow is pleasant. It is this: one person, with an AI agent doing most of the typing, can ship a real product only when the intent is clear, the architecture enforces the rules, and the verification loop is fast enough to keep everyone honest.
The bottleneck was never the stack. It was whether the product's intent had been encoded into durable boundaries before the speed arrived.
The web app that grew up
GSD is an Eisenhower-matrix task manager. That is the whole premise. Every task gets placed on two axes, urgent and important, and falls into one of four quadrants: do first, schedule, delegate, or eliminate. The job to be done is not "store my tasks." It is decide what matters, then act, and keep that decision honest as the day changes.
I built it because every other task app I tried was either a notebook with checkboxes or a project-management cockpit with forty buttons. I wanted the 2×2 grid to be the entire surface, and everything else to live one layer down: recurrence, subtasks, dependencies, time tracking, analytics, import, and export. Reachable, but never crowding the one question I open the app to answer: what do I work on next?
Over two years it grew the way useful software grows. Natural-language capture, so I could type a task and let punctuation do the sorting instead of filling out a form. Recurring tasks. Subtasks. A dependency graph, so "ship the release" could know it was blocked by "finish the changelog." Time tracking. Smart views. Import and export, because it was my data and I wanted it to stay mine. It was offline-capable, it was fast, and it was the tool I actually used every day.
That last part is the trap. When the thing you built is the thing you use, "good enough" stops being a verdict and becomes a habit.
The PWA was a polite lie
I wanted GSD on my phone, and the honest, cheap, responsible answer in 2023 was a Progressive Web App. Add to Home Screen. A service worker for offline. A manifest so it got an icon and launched without browser chrome. On paper, a native-feeling app for the cost of a config file.
It was fine. It was also a polite lie, and the lie lived in the phrase "native-feeling."
A PWA can render almost anything a browser can render, which is nearly everything. What it cannot do, at least not in the way this app needed, is belong to the phone. The things that make a mobile task manager actually frictionless are precisely the things that sit deepest in the operating system:
- Capture has to be instant, from anywhere. The whole point of an Eisenhower box is to catch a thought before it evaporates. On a real phone that means a Home Screen widget, a Lock Screen widget, a Share Sheet entry so you can throw a link or a selection straight into the inbox, Siri, Shortcuts, Spotlight, and App Intents. With the PWA, capture meant unlock, find the icon, wait for the web view, tap the box, and by then half the thoughts were already gone. Capture cannot be a scavenger hunt.
- It never stopped feeling ported. Scrolling had the wrong inertia. The keyboard covered the wrong things. Swipe gestures fought the browser's own. Every interaction was a translation, and you could feel the seams.
- The platform kept it at arm's length. Web push and badging exist now, and for many apps that is enough. But GSD did not need one more way to notify me. It needed native capture surfaces that felt like part of the system instead of a detour through a browser. Those are not polish features in a task manager. They are the product.
None of these are bugs. They are the boundary of the medium. A PWA is a website that has been allowed to sit closer to the Home Screen. That is useful, and sometimes it is exactly the right trade. But a task manager you reach for thirty times a day needs to be more than nearby. It needs to be a citizen of the device.
So the question stopped being "is the PWA good enough?" It was good enough. The question became: is good enough what I want from the tool I rely on most?
It was not.
Native, not ported
The temptation, once you decide to go native, is to port. Take the web layouts, the web components, and the web mental model, then transliterate them into Swift. It is faster. It is also how you end up with a native app that feels exactly as ported as the PWA did, just compiled.
I set one rule before writing a line of the rebuild: reimagine around the platform, do not transliterate the web.
Concretely, that meant a few non-negotiables.
iPhone and iPad are co-equal, first-class targets, not one layout stretched to fit both. iPhone is a TabView. iPad is a NavigationSplitView with a sidebar and a detail pane. Each is laid out for its own size class, not scaled until it stops looking wrong.
The interactions are iOS interactions: swipe actions on cards, context menus on long-press, drag-and-drop to move a task between quadrants, a ⌘K command palette on the keyboard, widgets, and App Intents. The capture box is always one tap away. The punctuation shorthand does the classifying, with a bang for priority, an asterisk for a subtask, and a #tag to file it, so you never touch a form just to add a task.
And one principle shaped the architecture more than any other: privacy is the product, not a setting.
GSD is offline-first as a correctness constraint, not a marketing line. The app is fully usable with no account and no network. All your data lives on the device. Cloud sync, bidirectional with the same backend the web app uses, is strictly opt-in. The UI is never allowed to imply your data has left the device when it has not.
That sounds like a values statement. It is actually an engineering specification, and it dictated where every byte was allowed to live.
The dependency direction is the design
Here is the part that looks like Swift trivia and is actually load-bearing. If you never write a line of Swift, stay with me, because the principle is portable: enforce your architectural boundaries in tooling, so they cannot be crossed by anyone, including you, and including an agent that has no memory of why the boundary exists in the first place.
All of GSD's real logic lives in a layered Swift package, GSDKit. The app itself and its two extensions are thin shells on top. The layers are ordered, and the order is the design. Each layer is only allowed to depend downward:
GSDModel is the domain. The Task, the four quadrants, the capture parser, the recurrence engine, and the dependency graph live there. It has zero dependencies, enforced by the package manifest, so you physically cannot import the database into it. That constraint is the point. The rules of the product cannot get tangled up with how the product is stored.
GSDStore owns persistence with GRDB and exposes exactly one way to change a task: a single TaskStore that every mutation flows through. There is no other path. When anything changes a task, TaskStore stamps its updatedAt from an injected clock and enqueues it to the sync queue, in that order, every time.
A single mutation path sounds like bureaucracy until you realize it is why sync is trustworthy. There is exactly one place where "a thing changed" is true, so there is exactly one place that has to be correct.
GSDSync is a pure actor. The pull, push, deletion-reconcile, and last-write-wins logic all live there, unit-tested with no network and no simulator. The untestable glue lives up in the app as a separate coordinator: when to sync, watching the network path, and reacting to the app returning to the foreground. The tested core stays pure. The messy timing stays quarantined.
GSDSnapshot is the quiet hero. It is a GRDB-free contract that the widgets and the Share extension talk to, so the extensions can share real logic with the app without ever pulling the database into their process. An iOS extension is a tight, memory-constrained little thing, and you do not want a full SQLite stack loading inside your Lock Screen widget. The snapshot layer lets the extensions be first-class without being heavy.
The fast feedback loop falls out of this for free. Because the logic has no UI and no backend bolted to it, the entire test suite runs in under a second from the command line, with no simulator boot and no build-and-deploy. I could change the recurrence math and know in a second whether I had broken month-end rollover.
That speed is not a nicety. It is what let me move fast without breaking the tool I use every day.
Every one of these boundaries is enforced by the compiler or the package manifest, not by my good intentions. That distinction is why the next part of the story worked.
The clock that lied
Let me make the architecture concrete with the bug I am fondest of, because it is the kind of bug you only get to have once you are syncing real devices in the real world.
Sync is last-write-wins. Two devices edit, the most recent edit wins, and "most recent" is decided by a timestamp. Simple, until you ask the obvious question: whose clock?
My first cut trusted the device clock and used it to drive the pull cursor, the marker that says "give me everything changed since X." This works perfectly right up until a device's clock is wrong, which devices' clocks routinely are, by seconds or minutes.
The failure mode is silent and nasty. If a device thinks it is slightly in the future, it asks the server for changes since a moment that has not happened yet. The server truthfully answers "nothing," and pulled changes quietly vanish. No error. No crash. A task you completed on your phone simply never shows up on your iPad, and you have no idea why.
The fix is a one-line idea that takes real care to get right: never trust the local clock for the cursor. Advance the pull cursor using the server's stamp on each record, not the device's. The server is the only clock all devices agree on, so it is the only clock allowed to decide what "since" means.
Once the cursor follows server time, device clock skew can no longer lose a pull. The worst it can do is resolve a single conflict the wrong way, which last-write-wins is allowed to do, instead of silently dropping data, which it is never allowed to do.
That is the whole reason the architecture is shaped the way it is. The sync engine is a pure, testable actor specifically so that a subtle, timing-dependent, data-losing bug like this can be reproduced and pinned in a unit test that runs in milliseconds, instead of being chased across two physical devices and a flaky network.
There is a native-only sibling to this story I will spare you the depth of: 0xDEAD10CC, the exception iOS throws when an app holds a database file locked while suspended in the background. A widget timeline refresh or a background Siri write can wake just enough of your app to touch the database at exactly the wrong moment. The fix is to observe the system suspending the database and back off gracefully.
You do not get bugs with names like that on the web. They are the toll you pay for actually living on the device.
Spec, plan, execute, on repeat
This is where the earlier boundary point becomes practical. I built almost all of this with an AI coding agent. Not the cliché version, where the AI writes your app while you watch. Something more deliberate, and more interesting.
The mistake people make with coding agents is treating them like a vending machine. Describe a feature, receive a feature, paste it in, repeat. That scales to toys and collapses on anything real, because there is no through-line. Every interaction starts from zero, the architecture drifts, and three weeks in you have a pile of plausible code that does not cohere.
What worked was imposing a rhythm and never breaking it. Every meaningful chunk of GSD went through the same three beats:
- Spec. Before any code, write down what we are building and why as a design document: the behavior, the edge cases, and the constraints. Not for ceremony. For shared ground. The agent and I had to agree on what "done" meant before "how" was on the table.
- Plan. Turn the spec into an ordered implementation plan, with the actual steps in sequence and each step small enough to verify on its own.
- Execute. Work the plan one verifiable step at a time, with the sub-second test suite as the gate between steps.
To make that concrete, one loop in the commit log looked like this: a commit reading docs: sync engine design, then docs: sync implementation plan, then a run of feat: commits working the plan one step at a time, then a test: commit that pinned the clock bug above so it could never come back.
Scroll the whole history and you see that shape repeat. Foundations, then the core matrix, then task depth, then filtering and smart views, then the editorial redesign, then sync hardening, then Sign in with Apple, then widgets and Siri, then the Mac port. Each phase had its own small spec, plan, execute loop.
Two things made the difference between this being a force multiplier and being a mess generator.
The first is that the architecture did the agent's homework for it. Those rigid layer boundaries are not just good design for humans. The manifest that forbids importing the database into the domain, the single mutation path, and the pure sync actor are guardrails for an agent. When the compiler and the package manifest enforce the rules, an agent cannot drift across a boundary it should not cross, no matter how confidently it tries.
The constraints that make the code testable are the same constraints that make it safe to generate.
The second is that I kept a written record of why, not just what. Alongside the code there is a log of the reasoning behind decisions, the reason each file is shaped the way it is. When you come back two months later, or when the agent comes back with no memory of the last session, that log stops you from re-litigating a decision you already made carefully the first time.
I want to be precise about my role, because it is easy to mythologize this in either direction. I did not write most of the lines. I also did not get to stop thinking. I wrote the specs. I made the architectural calls. I caught the clock bug by asking "whose clock?"
That is a question, not a keystroke, and it is the kind of question the whole thing lived or died on. The agent was extraordinary leverage on clear intent and roughly useless without it. It typed faster than I ever could, and it never once decided, on its own, what the product should be.
There is the rule, if you want one line to take to work on Monday: an agent respects the boundaries your tooling enforces and ignores the ones that live only in a senior engineer's head.
Build the first kind.
The unglamorous miles
Here is what nobody tells you about going native: the app is the easy part. Shipping it is a long, unglamorous tail, and native makes you walk every step of it.
The web app deploys when I push. The native app has to be archived, signed, uploaded, and reviewed, on two platforms, against rules that exist whether or not they make sense for your app. I automated what I could. A release script bumps the version, archives, exports, and uploads to TestFlight for both the iOS and Mac Catalyst builds.
Automation removes the typing. It does not remove the requirements.
A sampling of the miles, none of which a PWA would have charged me for:
- Mac Catalyst. "Universal" sounds free. It is not. The Mac build needed real menu-bar commands, a sensible minimum window size, an About panel, and a careful exclusion of the iOS-only extensions from the Mac target. It also grew its own species of bug, like a web-auth callback that arrives off the main thread through Catalyst's XPC plumbing.
- In-app account deletion. App Store guideline 5.1.1(v) is blunt: if a user can create an account, they must be able to delete it from inside the app. So there is now an ordered remote-then-local deletion flow with a confirmation, an entire feature built to satisfy a rule that, to be fair, is the right rule.
- Export compliance. Every upload asks whether your app uses non-exempt encryption. For GSD, the answer was no non-exempt encryption, only ordinary HTTPS, and a single Info.plist key tells App Store Connect that, so builds go straight to testing instead of stalling on the same question every time.
- A marketing demo, built like software. Even the promo video became a small engineering project: a seeded demo state behind a launch flag, an automated UI test that choreographs each scene, simulator screen recordings, and ffmpeg stitching it into a captioned 16:9 cut. Of course it did. Once you have the discipline, you point it at everything.
This is the part of "native, not ported" that does not show up in the demo. Belonging to the platform means living by the platform's rules, and there are a lot of them. I knew that going in. It is part of what I meant by on purpose.
The objection I would raise too
Here is the fair version of the skeptic's case, because I would raise it too. I rebuilt a personal app I control. No team, no deadline, no stakeholders, no roadmap to negotiate, and full license to overbuild a tool that serves exactly one user. Most engineering does not get to take the long way around, and a clean result from a one-person project proves less than it looks like it proves.
All true. Two responses.
First, the native path was the expensive one on purpose, and I knew there were cheaper ones. A cross-platform native framework would have bought back a lot of time. I skipped it because the integrations I cared about most, widgets, Siri, Spotlight, App Intents, and a Share Sheet that actually feels native, are the ones those frameworks tend to fight hardest. That was a deliberate trade for this specific tool, not a claim that everyone should make it.
Second, and this is the part that matters, the thing that generalizes is not the native craft. It is the discipline. The constraints that let me move fast alone are the same constraints that keep a team of fifty from drifting, which means they matter more under real deadline pressure, not less.
Point an agent at a clean codebase and you get leverage. Point one at a sloppy codebase and you get the leverage to make a bigger mess faster. The part of this project worth copying has nothing to do with my having unlimited time. It is the boundaries, the single obvious path, the fast verification loop, and the written record of why.
Those are not luxuries. They are how you make speed safe.
The slower path, on purpose
So: I had a working app, and I spent months rebuilding the mobile experience from scratch, in a new language, against two platforms' worth of rules, to arrive at software that does the same job the PWA already did.
Put that way it sounds like vanity. It was not.
The PWA answered one question: can I use GSD on my phone? Native answered a better one: does GSD belong on my phone? The gap between those two questions is the reason I did it. Capture is now a widget tap or a sentence to Siri, fast enough that the thought survives the act of catching it. The app scrolls and swipes and feels designed for iOS because it was, not translated into it. My data lives on my device, provably, with sync as a choice I make rather than a default I have to trust. And the same matrix is now at home on a phone, a tablet, and a Mac, each laid out for what it is.
I went looking for an app that belonged on my phone. I came back with something I value more: a clean, layered, tested core, and proof of how it behaves under an agent.
That is the part worth copying, whether or not you ever take the long way around yourself. Look at the boundaries in your own system and ask whether your tooling enforces them, or whether you are simply trusting everyone to remember. In the era we just walked into, that answer decides whether your agents make you faster, or just make your mess arrive sooner.
Good enough is a trap precisely because it is true. The PWA was good enough. I rebuilt it natively anyway because the tool I reach for thirty times a day was worth it.
But the discipline is the part I am actually keeping, and it is the part I would hand to you.
Related reading
- The Frame Is the Bottleneck. Why the constraint moves from execution to the human framing of the problem as agents get better at the typing.
- Spec, Standards, Specialists: How I Actually Build with AI. The working loop behind the spec, plan, execute rhythm in this post: a refined spec, a layer of standards, and coordinated specialists.
- Eleven Years, One Week, and an AI Co-Pilot: Rebuilding TravelTimes for iOS. Another long-lived app rebuilt with an agent in a focused sprint, including the parts the agent got wrong.
