High class imbalance: In any given underwater habitat, a few species are common, while many more are rare. AI training data reflects this reality, with some classes heavily overrepresented while others are severely underrepresented. This imbalance can lead to biased models that perform poorly when asked to identify a rare species, impacting conservation efforts that depend on monitoring such populations.
Domain shift: AI models trained on data from one region of the ocean often struggle to perform in another due to significant differences in environmental conditions, lighting, and species composition. A model trained on a clear, shallow coral reef may perform poorly in a turbid, deep-sea environment, creating a major generalization challenge for marine biologists.
Object-related complications
Camouflage and mimicry: Many marine species have evolved sophisticated camouflage to blend in with their surroundings. This makes them inherently difficult for AI to detect and identify, as the models must be able to discern subtle features that distinguish the organism from the background.
Occlusions and overlapping objects: Underwater scenes are often cluttered, with species partially or fully hidden by vegetation, coral, or other marine life. For example, schools of fish can overlap, making it challenging for AI to count and identify individuals accurately.
Scale variation: Marine organisms come in a massive range of sizes, from microscopic plankton to large whales. A single AI model may struggle to accurately detect and classify objects across such a vast scale, with smaller objects often being overlooked due to insufficient image resolution.
Algorithmic and practical limitations
Interpretability and trust: The "black box" nature of many deep learning models makes it difficult for marine biologists to understand and trust how the AI arrives at a specific conclusion. Without clear explanations, researchers may be hesitant to rely on AI-generated data for high-stakes decisions, such as directing conservation efforts.
Computational cost: Training complex AI models requires significant energy and computational power, which can be a limiting factor, especially when using underwater robots with restricted computing resources. Running the models in real-time, as required for autonomous research, remains computationally challenging.