Per-pixel bounding-box regression + DBSCAN for handwritten word detection - visual walkthrough of WordDetectorNet [P]
Signal
72
Hype
18
In three linesWordDetectorNet uses per-pixel bounding-box distance regression + DBSCAN for handwritten word detection. Each pixel classified as a word pixel regresses 4 scalar distances, generating thousands of candidates merged via DBSCAN with distance = 1 − IoU. Architecture: ResNet18 → FPN-style decoder → 6 output channels per pixel (2 segmentation logits + 4 distances). Trained on IAM, 448×448 → 224×224.Read source
Your take?
Summary generated by Claude — human-verified