What does this experiment reveal about the limits of using AI coding tools on mobile for complex projects?

The experiment shows that AI coding tools like Claude Code can handle large-scale architectural changes from a phone, but only if you provide clear context and break work into phases. Mobile is great for steering and quick feedback, but deep code review and nuanced design decisions still benefit from a desktop environment. The biggest risk is losing track of existing patterns or quality standards if you rely on casual check-ins instead of structured guidance.

How should teams approach onboarding AI coding assistants to avoid the pitfalls described here?

Teams should treat AI coding assistants like new engineers: invest in upfront onboarding with a dedicated context file, spell out both technical and UX requirements, and use checkpoints to review work before it snowballs. Giving the AI a 'preserve first' instruction helps maintain consistency with existing patterns. This approach minimizes cleanup and ensures the AI's contributions align with the team's design principles.

How does the 'passive contribution' model with AI compare to collaborating with a human engineer?

Relying on AI for passive contributions, like occasional check-ins and minimal direction, leads to the same problems you'd face with a human engineer dropped into a complex codebase with no onboarding. The work might be technically competent, but it often misses the product's culture and subtle requirements. Active collaboration, with clear context and regular feedback, is essential for both AI and human contributors to deliver high-quality results.

If you want to use Claude Code for major architectural changes, what concrete steps should you take based on these lessons?

Start by writing a CLAUDE.md file that explains your architecture, design principles, and what not to touch. Break the migration into phases with explicit checkpoints, and give detailed UX requirements if user-facing elements are involved. Use a branch for changes, review on desktop when possible, and always instruct the AI to document and preserve existing patterns before making modifications. These steps help you harness the AI's strengths without sacrificing quality or consistency.

A Day with Claude Code from My iPhone

The Experiment

I spent 24 hours (and every single daily credit) using the Claude Code app on my iPhone. I overhauled the architecture of a core agentic system. Roughly 15,000 lines of code across 25+ files. 🤣

No laptop. No IDE. No special developer setup. Just me, my phone, and a sense that this was either brilliant or reckless.

This is what went well, what didn’t, and the lessons that resulted.

The Setup (Or… Lack Thereof)

Here's the thing: I intentionally kept the setup minimal. I gave Claude Code access to my repo and described the change I wanted. That was it. No CLAUDE.md file with project context. No architecture diagrams. No guardrails about what to preserve. I wanted to see what happens when you treat Claude Code like a senior engineer you just dropped into your codebase with zero onboarding.

The change was ambitious. My app used a rigid agentic architecture: a predefined input/output chain of agents wired together in explicit pipelines. Each pipeline mapped to a specific user intent that I detected from incoming prompts. Intent A triggers Pipeline A, which calls Agent 1 → Agent 2 → Agent 3 in fixed order.

This worked, but it was brittle. Every new intent meant wiring up a new pipeline. Worse, it locked future models into a reasoning path I designed, rather than letting the models leverage their own increasingly sophisticated reasoning to choose the right approach. I was essentially micromanaging the LLM.

The migration target: a tool registry and defined output guideline pattern. Instead of hardcoded pipelines, the LLM would have access to a registry of tools and clear guidelines for what the output needed to look like. It could then decide which tools to invoke, in what order, based on the detected intent. More like giving a great problem-solver a toolbox rather than a step-by-step checklist.

What Actually Happened

I kicked off the task from my phone, described the architectural vision, and let Claude Code run. (🏃🏻‍♂️💨)

And it worked. Kind of.

Over the course of the day, Claude Code:

Identified the existing pipeline architecture across the codebase
Created the tool registry abstraction
Defined output guidelines for each intent category
Rewired the agent orchestration to use dynamic tool selection
Updated 25+ files to support the new pattern
Kept the system functional end-to-end (it still ran)

That's impressive. With minimal context, it understood the existing architecture well enough to replace it with something fundamentally different. On a phone. While I was living my life.

The Mobile Experience

I'll say this plainly: the Claude Code mobile experience is much better than you'd expect. No special setup, no "Remote Control" tethered to a laptop terminal. I just opened the Claude app on my iPhone, connected it to my repo, and started working. My phone was the only device I used all day. The laptop stayed closed.

That's worth emphasizing because people assume mobile means a compromised experience. It wasn't. I could describe architectural changes, review what Claude Code produced, course-correct when it went in the wrong direction, and keep the whole thing moving.

There are limitations, though. Reviewing 25 changed files on a phone screen is not the same as doing it in Cursor. You're working with compact artifact views that are fine for quick checks. And I didn’t invest enough time to figure out how to QA the visual design.

But the core loop - describe what you want, let it run, check in, redirect - works amazingly well. The voice mode is especially nice for thinking out loud while you're walking around. I’m genuinely blown away.

Where It Went Wrong

Claude Code completely ignored the agentic outputs I'd already set up. Here's the thing: the plan was simple. 1) Give the LLM room to think and pick its tools, 2) Get the same result, just way better. Claude nailed the mechanics (swap out pipelines for a registry), but totally missed the point: maintain output consistency while making routing more flexible. That's on me. I outlined the architecture I wanted, but forgot to spell out the output contract. Classic context engineering fail. 🤦🏻‍♂️

It overwrote the existing UX patterns with something weaker. This was, by far, the most frustrating part. It's not that Claude Code had no opinion on the UX; it had an opinion. It just didn't match the existing patterns, and what it implemented fell short of my bar. The result felt vibe-coded: functional but generic. I spent hours reworking it by hand in Cursor. 😫

I'm extremely picky about UX. In hindsight, I'd run a significant change like this through a dedicated design tool like Pencil first, then let Claude Code implement against that spec rather than letting it freestyle the interface.

What I Learned

I set this up to fail in specific, instructive ways. By providing minimal context, I was testing the floor of what Claude Code can do autonomously. The floor is very high. Claude Code can navigate a complex codebase and execute a substantial architectural migration with minimal guidance. But the floor and ceiling are (currently) still a ways apart, and the gap between them is entirely filled by context engineering.

The "passive contribution" model is seductive but dangerous. It felt great to check in casually and give thumbs-up or slight redirects. But this isn't how you'd work with a human engineer either. If you hired a senior engineer, dropped them into a 15,000-line codebase with no onboarding, and only checked in via text a few times a day, you'd expect exactly the problems I got: technically competent work that doesn't respect the existing culture of the product.

The architectural migration was the wrong level of autonomy. Migrating from hardcoded pipelines to a tool registry is a high-judgment task. It requires understanding not just what the code does, but why it's structured the way it is, what tradeoffs were considered, and what constraints I care about. That's not a task you delegate with a run-on paragraph of instruction - to a human or an AI.

How to Do This Better

If I were running this experiment again, here's what I'd change.

Write a CLAUDE.md file first

Thirty minutes of upfront context saves hours of cleanup. Cover your architecture and why it's structured that way, your design principles, what not to touch, and your code conventions. It's the difference between onboarding an engineer and just giving them repo access.

Break the task into phases with checkpoints

Don't say "migrate the architecture." Say: analyze what exists and show me. Propose the new design and show me. Implement one layer and show me. Each checkpoint is a chance to course-correct before work compounds in the wrong direction. This is especially important on mobile where deep code review is hard.

Give it UX requirements, not just technical ones

Claude Code can implement UX requirements well when you specify them. If the change affects anything user-facing, be explicit about loading states, error messaging, and interaction patterns. Better yet, run significant changes through a design tool first and let Claude Code implement against that spec.

Use a "preserve first" instruction

Something like: "Before making changes, document the existing patterns in use. Your changes should be consistent with these patterns unless I explicitly ask you to change them." This single instruction would have prevented a lot of my mop-up work.

Work on a branch, review on desktop

Direct from mobile during the day. Do a focused review session on desktop in the evening. Mobile for steering, desktop for quality control.

The Takeaway

Directing a 15,000-line architectural migration from my phone, in between actual life, still feels bonkers. The phone is a shockingly effective steering wheel, but only if you give clear context and describe what “good” looks like.

I can’t help but think: Coding is becoming commoditized. Designing the context that guides it is the new superpower.