Methodology & Results
I gave three AI coding tools the same detailed prompt: build a 6-card glassmorphic dashboard with live API integrations, AI-generated images, weather forecasts, financial widgets, graphs, and aurora animations.
Then I measured what it took to get each one working — and if it even worked at all, with little to no human code modifications. I deliberately made this more complex than it needed to be, with multiple languages, external APIs, and visual design requirements.
Test each AI-generated dashboard yourself:
Each tool received a comprehensive prompt specifying:
/
├── index.html
├── css/
│ └── styles.css
├── js/
│ └── scripts.js
├── python/
│ ├── server.py
│ ├── news.py
│ ├── space.py
│ └── requirements.txt
├── data/
│ ├── images/
│ │ ├── 1.jpg, 2.jpg, 3.jpg, 4.jpg (news thumbnails)
│ │ └── aurora.webp (animated aurora)
│ └── json/
│ ├── 1.json, 2.json, 3.json, 4.json (news articles)
│ └── aurora.json (Kp-index data)
├── setup.sh / setup.bat
└── README.md
Plus a ticker tape widget between cards 2 and 3 showing SPX, NASDAQ, DJI, EUR/USD, BTC, ETH, Gold.
| Metric | What We Measured |
|---|---|
| Messages to Completion | Total prompts needed for working result |
| Human Intervention | Manual edits, file creation, debugging required |
| Features Working | How many of 6 cards rendered correctly |
| Visual Quality | Design execution, polish, attention to detail |
| Self-Diagnosis | Did the tool identify and fix its own errors? |
| Code Analysis | Token-based clone detection across outputs |
| Metric | Claude Code | Cursor | Copilot |
|---|---|---|---|
| Setup Friction | High | Low | Medium |
| Prompts Required | 2 | 3 | 5+ |
| Features Working | 5.5 / 6 | 4 / 6 | 2.5 / 6 |
| Visual Quality | 9/10 | 8/10 | 6/10 |
| Self-Diagnosis | Yes | Partial | No |
| Aurora Animation | ✓ Perfect | ✗ Failed | ✗ Failed |
"Different AI tools produced stylistically different code, but converged on the same underlying logic patterns."
As AI-generated code proliferates across enterprises, the need for automated scanning — SAST, SCA, license compliance — becomes not just important, but essential.
This experiment demonstrated that even with identical requirements, AI tools produce different code patterns, choose different dependencies, and handle security considerations differently. Manual review simply cannot scale to catch these variations across an organization with hundreds of developers using AI assistants.