A Day with Claude Code from My iPhone
Here's What Happened. Spoiler Alert: It (Pretty Much) Crushed It.
Abstract
I spent a day using only Claude Code on my phone to rewire a complex agentic system, skipping all the usual onboarding and context you'd give a human engineer. The results were wild: Claude Code handled the technical migration with minimal input, but missed key UX details and output consistency because I didn't spell out what mattered most.
If you want AI to do high-judgment work, you have to give it the right context, checkpoints, and clear instructions, just like you would with a real teammate. Coding is rapidly becoming a commodity skill, but the ability to shape the context and define what 'good' looks like is how you'll actually steer the future.
The Experiment
I spent 24 hours (and every single daily credit) using the Claude Code app on my iPhone. I overhauled the architecture of a core agentic system. Roughly 15,000 lines of code across 25+ files. 🤣
No laptop. No IDE. No special developer setup. Just me, my phone, and a sense that this was either brilliant or reckless.
This is what went well, what didn’t, and the lessons that resulted.
The Setup (Or… Lack Thereof)
Here's the thing: I intentionally kept the setup minimal. I gave Claude Code access to my repo and described the change I wanted. That was it. No CLAUDE.md file with project context. No architecture diagrams. No guardrails about what to preserve. I wanted to see what happens when you treat Claude Code like a senior engineer you just dropped into your codebase with zero onboarding.
The change was ambitious. My app used a rigid agentic architecture: a predefined input/output chain of agents wired together in explicit pipelines. Each pipeline mapped to a specific user intent that I detected from incoming prompts. Intent A triggers Pipeline A, which calls Agent 1 → Agent 2 → Agent 3 in fixed order.
This worked, but it was brittle. Every new intent meant wiring up a new pipeline. Worse, it locked future models into a reasoning path I designed, rather than letting the models leverage their own increasingly sophisticated reasoning to choose the right approach. I was essentially micromanaging the LLM.
The migration target: a tool registry and defined output guideline pattern. Instead of hardcoded pipelines, the LLM would have access to a registry of tools and clear guidelines for what the output needed to look like. It could then decide which tools to invoke, in what order, based on the detected intent. More like giving a great problem-solver a toolbox rather than a step-by-step checklist.
What Actually Happened
I kicked off the task from my phone, described the architectural vision, and let Claude Code run. (🏃🏻♂️💨)
And it worked. Kind of.
Over the course of the day, Claude Code:
Identified the existing pipeline architecture across the codebase
Created the tool registry abstraction
Defined output guidelines for each intent category
Rewired the agent orchestration to use dynamic tool selection
Updated 25+ files to support the new pattern
Kept the system functional end-to-end (it still ran)
That's impressive. With minimal context, it understood the existing architecture well enough to replace it with something fundamentally different. On a phone. While I was living my life.
The Mobile Experience
I'll say this plainly: the Claude Code mobile experience is much better than you'd expect. No special setup, no "Remote Control" tethered to a laptop terminal. I just opened the Claude app on my iPhone, connected it to my repo, and started working. My phone was the only device I used all day. The laptop stayed closed.
That's worth emphasizing because people assume mobile means a compromised experience. It wasn't. I could describe architectural changes, review what Claude Code produced, course-correct when it went in the wrong direction, and keep the whole thing moving.
There are limitations, though. Reviewing 25 changed files on a phone screen is not the same as doing it in Cursor. You're working with compact artifact views that are fine for quick checks. And I didn’t invest enough time to figure out how to QA the visual design.
But the core loop - describe what you want, let it run, check in, redirect - works amazingly well. The voice mode is especially nice for thinking out loud while you're walking around. I’m genuinely blown away.
Where It Went Wrong
Claude Code completely ignored the agentic outputs I'd already set up. Here's the thing: the plan was simple. 1) Give the LLM room to think and pick its tools, 2) Get the same result, just way better. Claude nailed the mechanics (swap out pipelines for a registry), but totally missed the point: maintain output consistency while making routing more flexible. That's on me. I outlined the architecture I wanted, but forgot to spell out the output contract. Classic context engineering fail. 🤦🏻♂️
It overwrote the existing UX patterns with something weaker. This was, by far, the most frustrating part. It's not that Claude Code had no opinion on the UX; it had an opinion. It just didn't match the existing patterns, and what it implemented fell short of my bar. The result felt vibe-coded: functional but generic. I spent hours reworking it by hand in Cursor. 😫
I'm extremely picky about UX. In hindsight, I'd run a significant change like this through a dedicated design tool like Pencil first, then let Claude Code implement against that spec rather than letting it freestyle the interface.
What I Learned
I set this up to fail in specific, instructive ways. By providing minimal context, I was testing the floor of what Claude Code can do autonomously. The floor is very high. Claude Code can navigate a complex codebase and execute a substantial architectural migration with minimal guidance. But the floor and ceiling are (currently) still a ways apart, and the gap between them is entirely filled by context engineering.
The "passive contribution" model is seductive but dangerous. It felt great to check in casually and give thumbs-up or slight redirects. But this isn't how you'd work with a human engineer either. If you hired a senior engineer, dropped them into a 15,000-line codebase with no onboarding, and only checked in via text a few times a day, you'd expect exactly the problems I got: technically competent work that doesn't respect the existing culture of the product.
The architectural migration was the wrong level of autonomy. Migrating from hardcoded pipelines to a tool registry is a high-judgment task. It requires understanding not just what the code does, but why it's structured the way it is, what tradeoffs were considered, and what constraints I care about. That's not a task you delegate with a run-on paragraph of instruction - to a human or an AI.
How to Do This Better
If I were running this experiment again, here's what I'd change.
Write a CLAUDE.md file first
Thirty minutes of upfront context saves hours of cleanup. Cover your architecture and why it's structured that way, your design principles, what not to touch, and your code conventions. It's the difference between onboarding an engineer and just giving them repo access.
Break the task into phases with checkpoints
Don't say "migrate the architecture." Say: analyze what exists and show me. Propose the new design and show me. Implement one layer and show me. Each checkpoint is a chance to course-correct before work compounds in the wrong direction. This is especially important on mobile where deep code review is hard.
Give it UX requirements, not just technical ones
Claude Code can implement UX requirements well when you specify them. If the change affects anything user-facing, be explicit about loading states, error messaging, and interaction patterns. Better yet, run significant changes through a design tool first and let Claude Code implement against that spec.
Use a "preserve first" instruction
Something like: "Before making changes, document the existing patterns in use. Your changes should be consistent with these patterns unless I explicitly ask you to change them." This single instruction would have prevented a lot of my mop-up work.
Work on a branch, review on desktop
Direct from mobile during the day. Do a focused review session on desktop in the evening. Mobile for steering, desktop for quality control.
The Takeaway
Directing a 15,000-line architectural migration from my phone, in between actual life, still feels bonkers. The phone is a shockingly effective steering wheel, but only if you give clear context and describe what “good” looks like.
I can’t help but think: Coding is becoming commoditized. Designing the context that guides it is the new superpower.