Identifying Surgical Instruments in Pedagogical Cataract Surgery Videos through an Optimized Aggregation Network

doi:10.48550/arXiv.2501.02618

Identifying Surgical Instruments in Pedagogical Cataract Surgery Videos through an Optimized Aggregation Network

Instructional cataract surgery videos are crucial for ophthalmologists and trainees to observe surgical details repeatedly. This paper presents a deep learning model for real-time identification of surgical instruments in these videos, using a custom dataset scraped from open-access sources. Inspired by the architecture of YOLOV9, the model employs a Programmable Gradient Information (PGI) mechanism and a novel Generally-Optimized Efficient Layer Aggregation Network (Go-ELAN) to address the information bottleneck problem, enhancing Minimum Average Precision (mAP) at higher Non-Maximum Suppression Intersection over Union (NMS IoU) scores. The Go-ELAN YOLOV9 model, evaluated against YOLO v5, v7, v8, v9 vanilla, Laptool and DETR, achieves a superior mAP of 73.74 at IoU 0.5 on a dataset of 615 images with 10 instrument classes, demonstrating the effectiveness of the proposed model.

Publication:

arXiv e-prints

Pub Date:

January 2025

DOI:

10.48550/arXiv.2501.02618

arXiv:

arXiv:2501.02618

Bibcode:

2025arXiv250102618S

Keywords:

Computer Science - Computer Vision and Pattern Recognition;
68T05;
68T10;
I.5

E-Print:

Preprint. Full paper accepted at the IEEE International Conference on Image Processing Applications and Systems (IPAS), Lyon, France, Jan 2025. 6 pages

ADS

Identifying Surgical Instruments in Pedagogical Cataract Surgery Videos through an Optimized Aggregation Network

Abstract