ABSTRACT

Blind people do lead normal lives and have unique ways of carrying out tasks. However, they undoubtedly have problems because to social issues and unavailable infrastructure. The most difficult issue for a visionless person, particularly one with total vision loss, is navigating around areas. Blind persons have difficulty locating objects in their surroundings. As a result, this motivated the creation of a real-time object detection (RTOD) system. The goal is to identify different object detection, tracking, recognition techniques, feature descriptors, and segmentation methods, which are based on the video frame and various tracking technologies. The standard methods are based on CNN, RNN, faster RNN, YOLO. The proposed architecture is a visual structural convolutional neural network that uses YOLOv3 architecture under the hood and is used in conjunction with the extended COCO dataset. The model is trained to generate a voice output of the object in the video clip while also providing the quadrant information. This proposed model can aid the visually impaired population to describe their surroundings and give them assertive information, which they are deprived from. The training accuracy, validation accuracy and validation loss are calculated. The training accuracy is 99.795%. Similarly, the validation accuracy is high (96.60%) and validation loss is very low (0.0035).