Reddit r/MachineLearning·23 May 2026

Per-pixel bounding-box regression + DBSCAN for handwritten word detection - visual walkthrough of WordDetectorNet [P]

Signal

Hype

In three linesWordDetectorNet uses per-pixel bounding-box distance regression + DBSCAN for handwritten word detection. Each pixel classified as a word pixel regresses 4 scalar distances, generating thousands of candidates merged via DBSCAN with distance = 1 − IoU. Architecture: ResNet18 → FPN-style decoder → 6 output channels per pixel (2 segmentation logits + 4 distances). Trained on IAM, 448×448 → 224×224.

Read source

Your take?

Vision Code generation Open source

Summary generated by Claude — human-verified

Per-pixel bounding-box regression + DBSCAN for handwritten word detection - visual walkthrough of WordDetectorNet [P]

Other angles on this story