Data Diff Checker
CSV regression testing tool for API workflows
The Problem
When working with e-commerce data feeds, you often need to compare API responses between production and development environments. A change that looks harmless in code can silently break field mappings, strip HTML content, or introduce subtle data inconsistencies.
Manual comparison doesn't scale when you're dealing with thousands of products and dozens of fields per product.
The Solution
Data Diff Checker is a Python CLI tool that automates regression testing for CSV-based API responses. It fetches data from both environments concurrently, normalizes the output, and generates detailed diff reports showing exactly what changed.
Key Features
- High-concurrency fetching — Async Python with configurable parallelism (default: 200 concurrent requests)
- Smart diffing — Handles HTML content, whitespace normalization, and field-level comparison
- Memory optimized — Streams large CSVs without loading everything into memory
- Live progress display — Real-time progress bars and activity log
- Detailed reports — Shows additions, removals, and changes with before/after values
Technical Details
| Language | Python 3.11+ |
| Async | aiohttp + asyncio |
| Output | CSV diff reports |
Sample Output
┌─ Data Diff Checker ─ Elapsed: 01:23 ────────────────────┐
│ Fetches: [████████████░░░░░░░░░░░░░] 48/200 (24.0%) │
│ Diffs: [██████░░░░░░░░░░░░░░░░░░░] 20/100 (20.0%) │
├─ Recent Activity ───────────────────────────────────────┤
│ 14:32:15 [Test 47] Starting (prod first)... │
│ 14:32:16 [Test 45] PROD done (status=200) │
│ 14:32:17 [Test 44] +0 added, -0 removed, ~3 changed │
│ 14:32:17 [Test 43] No differences │
└─────────────────────────────────────────────────────────┘ Why I Built It
I needed a reliable way to verify that changes to data feed configurations wouldn't break existing functionality. After manually comparing CSV exports one too many times, I built this tool to automate the process and catch regressions before they hit production.