Deep learning and computer vision algorithms for detection and classification of bearded seal vocalizations in the Arctic Ocean
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Year-round recordings of bearded seal calls were collected in the northeastern edge of the Chukchi Continental Slope (Alaska) in 2016-2017, 2018-2019, and 2019-2020. While the underwater vocalizations of bearded seals are often analyzed manually or using automatic detections manually validated, in this work, two detection and classification systems (DCS) based on deep learning techniques are proposed. ☐ The first system is divided in two sections. First, regions of interest (ROI) containing possible bearded seal vocalizations are found by the spectrogram 2D normalized cross-correlation of the measured signal and a representative template of each of two main calls of interest. Second, convolutional neural networks (CNN) are used to validate and classify the ROIs among several possible classes. The CNNs are trained on 80% of the ROIs manually labeled from one of the recorders. When validating on the remaining 20%, the CNNs show an accuracy above 95.5%. To assess the generalization performance of the networks, the CNNs are tested on the remaining recorders, located at different positions and deployed at different years, with a precision above 89.2% for the main class of the two types of calls. ☐ The second proposed DCS is based on the You Only Look Once (YOLO) algorithm on its latest version, YOLOV5 where the network learns how to detect and classify bearded seal vocalizations by using the principle of computer vision for object detection in images where bounding boxes enclose the object of interest. With this method the detection and classification are carried out by the deep learning models without the need for knowing specifics of the signal, meaning no ROIs or masks are needed. Another advantage of using YOLOV5 over other typical DCS is that the predicted bounding boxes have embedded statistical information about the vocalization such as the duration, bandwidth, and center frequency of the signals. In the generalization stage, YOLOV5 achieved an accuracy of 93.87% with a precision and recall above 94.9% and 90.6%, respectively, for the eight proposed classes. Furthermore, an analysis of the vocal behavior of the bearded seals showed that there exists a geographical dependence where this species prefers shallower water depths in the Chukchi Continental Slope.
Description
Keywords
Bearded seals, Computer vision, Deep learning, Marine mammals, Ocean acoustics, YOLO