Loading methodology...

AI Coding Tools Comparison

Methodology & Results

The Experiment

I gave three AI coding tools the same detailed prompt: build a 6-card glassmorphic dashboard with live API integrations, AI-generated images, weather forecasts, financial widgets, graphs, and aurora animations.

Then I measured what it took to get each one working — and if it even worked at all, with little to no human code modifications. I deliberately made this more complex than it needed to be, with multiple languages, external APIs, and visual design requirements.

Live Demos

Test each AI-generated dashboard yourself:

Project Requirements

Each tool received a comprehensive prompt specifying:

Folder Structure

/
├── index.html
├── css/
│   └── styles.css
├── js/
│   └── scripts.js
├── python/
│   ├── server.py
│   ├── news.py
│   ├── space.py
│   └── requirements.txt
├── data/
│   ├── images/
│   │   ├── 1.jpg, 2.jpg, 3.jpg, 4.jpg (news thumbnails)
│   │   └── aurora.webp (animated aurora)
│   └── json/
│       ├── 1.json, 2.json, 3.json, 4.json (news articles)
│       └── aurora.json (Kp-index data)
├── setup.sh / setup.bat
└── README.md

Design System

The 6 Cards

  1. Greeting & City Image: Time-based greeting, live clock, location detection, AI-generated retrofuturistic city image via Pollinations API for their specific location
  2. Weather Dashboard: Current conditions, 3-day forecast from OpenWeather; day/night icons from Google Weather API
  3. Global News: Top 4 articles from RSS feed with thumbnails, titles, descriptions
  4. Financial Dashboard: TradingView top stories widget + stock heatmap
  5. Northern Lights: Kp-index bar graph (gradient green→yellow→red), aurora animation from NOAA images, educational content
  6. Footer: Contact info, tool/model identification, copyright

Plus a ticker tape widget between cards 2 and 3 showing SPX, NASDAQ, DJI, EUR/USD, BTC, ETH, Gold.

Additional Requirements

Methodology

Process

  1. Initial Prompt: Each tool received the identical comprehensive specification
  2. Fix Iterations: 1-2 follow-up prompts to address issues and add enhancements
  3. Consistent Format: All corrections used the same structured format across tools
  4. Local Testing: Each version run locally on assigned ports for review
  5. Final Export: Final outputs from each tool deployed to bdai.ahzaz.io

Evaluation Criteria

Metric What We Measured
Messages to Completion Total prompts needed for working result
Human Intervention Manual edits, file creation, debugging required
Features Working How many of 6 cards rendered correctly
Visual Quality Design execution, polish, attention to detail
Self-Diagnosis Did the tool identify and fix its own errors?
Code Analysis Token-based clone detection across outputs

Results Summary

34%
CSS Duplication
15%
HTML Duplication
<1%
JS Duplication
<1%
Python Duplication
Metric Claude Code Cursor Copilot
Setup Friction High Low Medium
Prompts Required 2 3 5+
Features Working 5.5 / 6 4 / 6 2.5 / 6
Visual Quality 9/10 8/10 6/10
Self-Diagnosis Yes Partial No
Aurora Animation ✓ Perfect ✗ Failed ✗ Failed

"Different AI tools produced stylistically different code, but converged on the same underlying logic patterns."

Key Findings

Why This Matters for AppSec

As AI-generated code proliferates across enterprises, the need for automated scanning — SAST, SCA, license compliance — becomes not just important, but essential.

This experiment demonstrated that even with identical requirements, AI tools produce different code patterns, choose different dependencies, and handle security considerations differently. Manual review simply cannot scale to catch these variations across an organization with hundreds of developers using AI assistants.