AI Coding Tools Comparison

The Experiment

I gave three AI coding tools the same detailed prompt: build a 6-card glassmorphic dashboard with live API integrations, AI-generated images, weather forecasts, financial widgets, graphs, and aurora animations.

Then I measured what it took to get each one working — and if it even worked at all, with little to no human code modifications. I deliberately made this more complex than it needed to be, with multiple languages, external APIs, and visual design requirements.

Live Demos

Test each AI-generated dashboard yourself:

Claude Code

bdai.ahzaz.io/claude

GitHub Copilot

bdai.ahzaz.io/copilot

Cursor

bdai.ahzaz.io/cursor

Project Requirements

Each tool received a comprehensive prompt specifying:

Folder Structure

/
├── index.html
├── css/
│   └── styles.css
├── js/
│   └── scripts.js
├── python/
│   ├── server.py
│   ├── news.py
│   ├── space.py
│   └── requirements.txt
├── data/
│   ├── images/
│   │   ├── 1.jpg, 2.jpg, 3.jpg, 4.jpg (news thumbnails)
│   │   └── aurora.webp (animated aurora)
│   └── json/
│       ├── 1.json, 2.json, 3.json, 4.json (news articles)
│       └── aurora.json (Kp-index data)
├── setup.sh / setup.bat
└── README.md

Design System

Background: Dark abstract (smoke/orbs/waves) using purple palette (#5a2a82, #9a73b3)
Cards: Glassmorphism — backdrop blur, gradient purple border with glow effect
Card Width: Maximum 50% viewport on desktop, 95% on mobile
Typography: Roboto, white text with opacity hierarchy
Animations: Smooth scroll, fade-in on scroll, loading overlay

The 6 Cards

Greeting & City Image: Time-based greeting, live clock, location detection, AI-generated retrofuturistic city image via Pollinations API for their specific location
Weather Dashboard: Current conditions, 3-day forecast from OpenWeather; day/night icons from Google Weather API
Global News: Top 4 articles from RSS feed with thumbnails, titles, descriptions
Financial Dashboard: TradingView top stories widget + stock heatmap
Northern Lights: Kp-index bar graph (gradient green→yellow→red), aurora animation from NOAA images, educational content
Footer: Contact info, tool/model identification, copyright

Plus a ticker tape widget between cards 2 and 3 showing SPX, NASDAQ, DJI, EUR/USD, BTC, ETH, Gold.

Additional Requirements

Python scripts to fetch and process external data
Setup script creating virtual environment and installing dependencies
Each tool's version served on different port for localhost testing (8001, 8002, 8003)
Loading animation overlay until page fully loads

Methodology

Process

Initial Prompt: Each tool received the identical comprehensive specification
Fix Iterations: 1-2 follow-up prompts to address issues and add enhancements
Consistent Format: All corrections used the same structured format across tools
Local Testing: Each version run locally on assigned ports for review
Final Export: Final outputs from each tool deployed to bdai.ahzaz.io

Evaluation Criteria

Metric	What We Measured
Messages to Completion	Total prompts needed for working result
Human Intervention	Manual edits, file creation, debugging required
Features Working	How many of 6 cards rendered correctly
Visual Quality	Design execution, polish, attention to detail
Self-Diagnosis	Did the tool identify and fix its own errors?
Code Analysis	Token-based clone detection across outputs

Results Summary

34%

CSS Duplication

15%

HTML Duplication

<1%

JS Duplication

<1%

Python Duplication

Metric	Claude Code	Cursor	Copilot
Setup Friction	High	Low	Medium
Prompts Required	2	3	5+
Features Working	5.5 / 6	4 / 6	2.5 / 6
Visual Quality	9/10	8/10	6/10
Self-Diagnosis	Yes	Partial	No
Aurora Animation	✓ Perfect	✗ Failed	✗ Failed

"Different AI tools produced stylistically different code, but converged on the same underlying logic patterns."

Key Findings

Variability is the norm: Same prompt produced three distinctly different codebases with different patterns, dependencies, and potential vulnerabilities.
Quality varies significantly: From polished (Claude) to broken features (Copilot) using identical requirements.
No tool secured API keys properly: All three exposed keys in frontend JavaScript.
Self-diagnosis matters: Claude identified and fixed its own errors; others required explicit debugging prompts.
Surface quality can mask issues: Cursor had the best initial aesthetic but couldn't complete complex features.

Why This Matters for AppSec

As AI-generated code proliferates across enterprises, the need for automated scanning — SAST, SCA, license compliance — becomes not just important, but essential.

This experiment demonstrated that even with identical requirements, AI tools produce different code patterns, choose different dependencies, and handle security considerations differently. Manual review simply cannot scale to catch these variations across an organization with hundreds of developers using AI assistants.