Project Team: Scott Bainbridge, Mathew Wyatt, Leanne Currey-Randal. Australian Institute for Maine Science (AIMS), Dan Marrable, Mauro Radaelli. Curtin Institute for Computation (CIC).
As shallow water fish continue to be impacted by human activities and environmental change, the need to develop robust scalable monitoring methods is critical to understanding and sustaining ecological keystone species and species of commercial importance.
Baited Remote Underwater Video Stations (BRUVS) have become a standard tool for surveying fish around the world. BRUVS platforms are extremely simple to use in the field: cameras record fish attracted by a bait bag. However, the resulting video needs to be manually analysed for the number of target species, with every hour recorded in the field equalling at least an hour of manual analysis by trained fish ecologists. This is clearly not scalable, and the need to automate the identification of recorded fish has therefore become a major priority.
BRUVS are a steel frame with one or two video cameras mounted (GoPros or small Sony camcorders in mono configuration), with bait bags attached by a flexible pole to attract fish (pictured right). BRUVS were deployed in shallow reef environments down to 50m (n=20) and recorded for 1 hour. To form a training set >3500 images were extracted from high definition (1920x1080x30-fps) videos. Each image was analysed and labelled by fish ecologists for four target species: coral trout, starry trigger fish, tropical snapper and stripy bass. The number of cropped images ranged from ~20 to >2000 images per video per species.
You Only Look Once (YOLO), a Region-Convolutional Neural Network (R-CNN), was trained using the labelled training images (60% training, 20% testing, 20% validation) on the Pawsey Supercomputing Centre’s Nimbus cloud platform using an NVidia V100 GPU on an eight core virtual machine. YOLO is a very fast and accurate R-CNN.
While the model is still being trained on the full corpus of images, initial results of a four class model using approximately 15% of the training images shows an accuracy of greater than 90% on the validation data set (see pictures below). YOLO’s efficient algorithm is capable of processing each frame of the video footage, at full resolution, in faster than real time. On an NVidia V100, the model was capable of processing the video at approximately 45 frames per second.
Future work will focus on training the model on the full training data set and validating this method against the ecologist derived annotations. This research will eventually result in faster analysis, better application of human expertise and importantly provides a pathway to scale the observation of shallow water fish communities.
You can see an early preview of the video on YouTube