Last year we built a design system for a fintech client. Tokens, components, documentation, handoff specs. It took fourteen weeks. Two designers, one developer, a lot of Slack threads about naming conventions.
Last month we did it again for a different client. Three days. One designer with Claude Code and the Figma MCP server. The output was comparable. In some areas it was better.
That's not a flex. It's a warning about where the industry is going.
👉 What Three Days Actually Looked Like
Day one was token architecture. Instead of spending two weeks debating naming conventions in a Notion doc, we prompted Claude Code with our requirements: three-layer token system (primitives, semantics, components), DTCG standard, support for light and dark modes, and a naming convention that maps cleanly to CSS custom properties.
Claude generated the full token structure in about forty minutes. Not perfect. We adjusted the color ramp, renamed some semantic tokens, and reorganized the spacing scale. But the scaffolding was solid. The thing that used to take two weeks of back-and-forth took a morning.
Day two was components. We designed the core set in Figma: buttons, inputs, cards, modals, navigation. Then connected Claude Code via MCP. The agent read the Figma components, cross-referenced our token structure, and generated React components using our actual token values. Again, not perfect. We tweaked interaction states, fixed some responsive behavior, adjusted accessibility attributes. But the foundation was there.
Day three was documentation and cleanup. Claude Code generated component usage guidelines, token reference tables, and implementation notes based on the system it had just helped build. We edited for accuracy and voice. We ran accessibility audits. We stress-tested edge cases.
By end of day three, the client had a working design system with tokens, core components, and documentation. Not a prototype. A system they could use.
🔥 What AI Did Well
Token generation. AI is excellent at generating structured, consistent naming systems. Better than most humans, honestly, because it doesn't get lazy halfway through and start abbreviating things differently.
Component scaffolding. Given a clear Figma reference and token system, Claude Code generates component code that's 70-80% there. The structure is right. The token usage is correct. The remaining 20% is interaction nuance, edge cases, and polish.
Documentation. Writing component guidelines, listing props, describing usage patterns. AI handles this well because it can reference the code it just generated. The docs are accurate because they come from the source, not from a designer's memory of what they intended.
🧠 What Still Needs a Human
Design decisions. Claude Code doesn't know your brand. It doesn't understand why your buttons should feel heavy instead of light. It doesn't have an opinion about whether your input fields should feel clinical or warm. Those are taste decisions that define how a product feels, and AI can't make them for you.
Edge cases. The agent handles the happy path well. It struggles with error states that need specific copywriting, loading patterns that depend on API behavior, and responsive breakpoints that need human judgment about what to show and hide.
System integrity. An AI can generate consistent tokens. But deciding whether a system needs a fourth semantic layer, or whether two components should merge, or whether a pattern is too complex for the team to maintain. Those are architecture decisions that require understanding the team, the product, and the roadmap.
⚠️ What We Got Wrong
We initially tried to let Claude Code do too much at once. Generate the whole system in one prompt. That produced coherent-looking output that fell apart when we tested it. The tokens were internally consistent but didn't map well to real usage patterns.
The breakthrough was treating AI as a speed multiplier for each step, not as a replacement for the process. Design the token architecture manually. Then use AI to generate it. Design the components in Figma. Then use AI to code them. Write the doc outline. Then use AI to fill it in.
Human decisions, AI execution. That's the pattern that worked.
The three-month version wasn't better because it took longer. It was slower because every step was manual. The three-day version wasn't worse because it was fast. It was faster because the manual parts had been automated.
What didn't change: someone still needs to know what good looks like. AI just makes the distance between knowing and shipping a lot shorter.


