Wagon Damage Detection: Multi-Camera Computer Vision for Indian Railways

Key facts at a glance

Project facts & technologies

A snapshot of the WDD project — entities, technologies, scale, and outcomes — formatted for quick scanning by readers, search engines, and AI answer engines.

Project name: Wagon Damage Detection (WDD)
Operator / Client: Indian Railways Transport Company
Solution provider: AiSPRY
Industry: Railways · Freight · Industrial computer vision
Use case: Automated wagon inspection for damage and cargo residue
Core technology: YOLOv11 (detection · segmentation · OBB), PaddleOCR, CLAHE preprocessing
Damage classes detected: Closed Door, Open Door, Missing Door, Dents (side); Bulges, Hole, Crack, Gravel (top)
Camera configuration: 6 streams — top, left, right at entry + exit gantries
Data infrastructure: PostgreSQL (reporting) + MongoDB (raw metadata) · Celery · Redis · FastAPI
Pipeline stages: 17-stage automated pipeline (Detection Foundation 1–8 · Advanced Recovery 9–17)
Processing capacity: ~12 trains per day on GPU hardware
Outcomes: 5–8 hr → 2 hr per train · 80–90% manual review reduction · 99.2% frame detection · 98% OCR

About the industry

Why does wagon inspection matter in Indian Railways?

Indian Railways operates one of the world's largest freight networks. Every day, thousands of goods carriage wagons are loaded and unloaded across yards and sidings — handled by trucks, forklifts, grabs, magnets, and tipplers. Each handling cycle exposes wagons to mechanical damage: dents on side panels, cracks and holes in roofs, doors that get torn off or left open, and leftover cargo material that compromises the next consignment.

Two operational realities sit behind this case. Undetected damage compounds — small cracks become structural failures, missing doors lead to lost cargo, and wagons return to maintenance yards far more often than they should. And when a dispute arises about who damaged a wagon — the loader, the unloader, or the rail operator — the only evidence that matters is what the wagon looked like before and after a handling event. Manual inspection at every gantry takes 5–8 hours per train, depends on inspector fatigue, and produces inconsistent records. WDD was commissioned to make this inspection automatic, repeatable, and economically viable at scale.

The challenge

What was wrong with the earlier inspection approach?

The earlier process — partly manual, partly AI-assisted — surfaced six limitations that the WDD revision was specifically engineered to eliminate:

Key challenges

5–8 hours per train — dominated by manual frame selection from raw multi-camera footage.
Manual intervention at every stage — frame triage, duplicate removal, cross-view matching, and damage labelling required human effort throughout.
Duplicate wagon frames across views — the same wagon appeared repeatedly across views, requiring human verification to consolidate.
Inconsistent detection across views — results differed across the six camera views, especially under varied lighting and motion conditions.
Missed wagons in some streams — when the train ran fast, halted, or paused mid-gantry, wagons were missed in individual streams.
No reliable entry-vs-exit comparison — there was no automated mechanism to compare a wagon's condition at entry against its condition at exit.

The solution

How does the WDD platform work?

WDD is a fully automated, multi-stage computer-vision system. It ingests six concurrent video streams from the entry and exit gantries (top, left, and right view at each), runs a 17-stage detection-and-recovery pipeline, and writes structured damage records into a database that powers the TrainVision dashboard.

Automated wagon detection and recovery

Multi-stage YOLOv11 — filter model followed by a centring model identifies wagon frames across all six camera views simultaneously
99% validation accuracy — side-view and top-view models reliably separate wagon vs. engine, last car, or background
Automatic duplicate elimination — cooldown windows remove duplicate frames without manual triage
Missed-wagon recovery — rolling-median gap analysis with 5× threshold uses PaddleOCR-read timestamps to recover missed wagons from adjacent views

Wagon number reading (OCR)

PaddleOCR with CLAHE pre-processing — contrast enhancement, blur, thresholding, and 2.5× upscale before OCR
98% accuracy in the dashboard — wagon-number reads reconciled across left, right, and top views
Cross-view validation — readings reconciled before being committed to the database

Damage detection and entry-vs-exit comparison

Side-view damage — YOLO-Seg-V11L trained on closed-door, open-door, missing-door, and dent classes
Top-view damage — YOLO-Seg-V11m trained on bulges, cracks, and holes
Gravel residue — YOLO-OBB-L segmentation produces a percentage estimate of leftover material
Entry-vs-exit ledger — every damage labelled Old (present at both), New (only at exit), or Resolved (present at entry, no longer at exit)

Video demo

See WDD in action

A walkthrough of the TrainVision dashboard — six-stream upload, 17-stage pipeline execution, entry-vs-exit damage comparison, and the per-wagon drill-down used to settle accountability disputes.

Wagon Damage Detection — multi-camera AI inspection in action

Click to play · YOLOv11 detection, PaddleOCR wagon-number reading, entry-vs-exit comparison

<strong>Demo.</strong> TrainVision dashboard walkthrough — uploading six camera views, running the 17-stage WDD pipeline, and producing the entry-vs-exit ledger that classifies every damage as Old, New, or Resolved.

Six-stream ingestion — entry and exit gantries' top, left, and right views uploaded per train
Automated frame triage — YOLOv11 filter and centring models eliminate background, engine, and duplicate frames
OCR-driven matching — PaddleOCR reads wagon numbers to align entry and exit records
Entry-vs-exit ledger — each damage labelled Old, New, or Resolved with replayable evidence frames

Solution architecture

What is the architecture of the WDD pipeline?

The WDD pipeline is structured as five visible stages, with seventeen internal sub-stages handling detection, recovery, and reconciliation. Multi-camera capture streams six video feeds (entry top/left/right + exit top/left/right) into S3-compatible storage. Frame extraction filters background and selects centred wagon frames. AI damage detection runs three model families in parallel (side-view, top-view, gravel segmentation). The entry-vs-exit comparison engine aligns records via PaddleOCR-read wagon numbers and labels each damage Old, New, or Resolved. The TrainVision dashboard reads from a dual database — PostgreSQL for structured reporting and MongoDB for raw metadata — with Celery for parallel processing and Redis for snapshot caching.

Wagon Damage Detection architecture diagram showing five visible stages over a 17-stage automated detection pipeline — <strong>Figure 1.</strong> Multi-camera AI inspection pipeline — five visible stages over a 17-stage automated detection workflow.

Designed around constraints

How is WDD engineered for real-world video conditions?

Three constraints shaped the engineering choices: operational cost minimisation, transparency and accountability, and the unforgiving reality of railway video conditions.

Operational cost minimisation

Self-hosted models rather than cloud LLMs to keep per-train compute bounded
Parallel Celery workers for damage processing rather than serial execution
Redis snapshot cache so re-opening an inspection costs effectively nothing
2-hour processing target on GPU hardware, sized to handle ~12 trains per day per machine

Transparency and accountability

Every damage annotation traceable to a specific frame, camera view, and timestamp
Entry-vs-exit ledger auditable end-to-end — reviewers can replay the exact frames the AI used
Dispute resolution backed by structured evidence rather than inspector recollection

Real-world video conditions

Rolling-median gap detection for trains that halt or pause mid-gantry
5× median threshold to recover wagons missed in a stream when the train ran fast
CLAHE preprocessing and 2.5× upscale for low-contrast OCR at dawn and dusk
Regex-based artifact removal for noisy timestamp reads on dirty or tarpaulin-covered panels

Impact & outcomes

What measurable impact has WDD delivered?

The current production pipeline has shipped twelve major engineering tasks across UI, backend, and data infrastructure in the last delivery window. The numbers below are measured against the earlier semi-manual baseline.

Operational outcomes

Processing time per train: 2 hours, down from 5–8 hours previously
Manual intervention: zero — the pipeline runs end-to-end without an operator in the loop
Manual review time reduced by 80–90%
Duplicate wagon handling: 100% automatic elimination
Cross-view consistency: 100% — every wagon reconciled across left, right, and top views

Model and detection performance

Frame detection rate: 99.2%
Side-view and top-view wagon presence detection: 99% validation accuracy (YOLOv11-L)
Wagon number OCR in dashboard: 98% accuracy
Closed-door classification on side view: 93% validation accuracy
Frame extraction accuracy improvement: +35% over the prior approach
Processing speed: approximately 2× faster than the prior pipeline

Dashboard and infrastructure

Dual-database architecture (PostgreSQL + MongoDB) live in production
Celery task manager with multi-worker parallel damage processing operational
Redis server-level snapshot cache enables instant inspection re-open
Manual → Automatic toggle: first run executes the pipeline, subsequent switches load from Redis
Page-memory state survives navigation; comparison entries persist correctly to the database

Frequently asked questions

Wagon Damage Detection — frequently asked questions

Plain-English answers to the questions teams ask most often when evaluating the WDD platform — designed to be quoted directly by AI search engines and human readers alike.

What is the Wagon Damage Detection (WDD) platform?

WDD is an AI-powered, multi-camera computer-vision platform built by AiSPRY for Indian Railways. It automatically inspects goods carriage wagons at entry and exit gantries, detects eight classes of damage, reads wagon numbers via OCR, and produces an entry-vs-exit comparison report — replacing 5–8 hours of manual review per train with roughly 2 hours of automated processing.

What damage classes does the system detect?

Eight damage classes across two camera perspectives. From the side view: Closed Door, Open Door, Missing Door, and Dents. From the top view: Bulges, Hole, Crack, and Gravel residue. Following a recent scope review, scratches were removed from the taxonomy to keep the system focused on damages that drive maintenance and accountability decisions.

How does the system tell new damage apart from pre-existing damage?

Each wagon is photographed twice — once at the entry gantry and once at the exit gantry. The system reads the wagon number on the side panel via PaddleOCR, matches the entry record to its exit record, and runs an automated comparison. Each detected damage is labelled Old (present in both), New (only at exit), or Resolved (present at entry, no longer at exit). This produces the audit trail used to settle accountability disputes between loaders, unloaders, and the rail operator.

What was wrong with the earlier approach the WDD platform replaced?

The earlier process required 5–8 hours of manual review per train, including manual frame selection, duplicate removal, and cross-view matching. Detection results were inconsistent across camera views, duplicate wagon frames appeared frequently, and some wagons were missed entirely when the train ran fast or halted. The revised 17-stage pipeline replaces all of those manual steps with automated YOLOv11 filtering, rolling-median gap analysis, and PaddleOCR timestamp recovery.

What accuracy gains has the new pipeline produced?

Frame extraction accuracy improved by +35% over the prior approach, processing speed roughly doubled, the frame detection rate reached 99.2%, and wagon-number OCR in the UI hit 98% accuracy. Manual effort dropped by 80% and total manual review time fell by 80–90%, while duplicate handling and cross-view consistency are both at 100%.