Google has made significant progress in understanding dolphin communication, thanks to its collaboration with researchers at Georgia Tech and the Wild Dolphin Project (WDP). The new model, called DolphinGemma, is a foundational AI trained to learn the structure of dolphin vocalizations and generate novel sound sequences.
Researchers have been studying dolphins for decades using WDP’s non-invasive approach, which provides a unique dataset of underwater video and audio paired with individual dolphin identities. This research has led to the discovery of specific patterns in dolphin communication, such as signature whistles used by mothers and calves to reunite.
DolphinGemma is built on Google’s audio technologies, including the SoundStream tokenizer, and uses a model architecture suited for complex sequences. The ~400M parameter model processes dolphin sounds in real-time and identifies patterns, structure, and predicts subsequent sounds in a sequence, similar to how large language models predict the next word or token in human language.
The model is being deployed this field season with immediate benefits, such as identifying recurring sound patterns, clusters, and reliable sequences. This can help researchers uncover hidden structures and potential meanings within dolphin communication.
In addition to analyzing natural communication, WDP is exploring two-way interaction using technology in the ocean, led by the CHAT system. The system associates synthetic whistles with specific objects, enabling dolphins to learn and mimic these sounds to request items.
DolphinGemma will be shared as an open model this summer, allowing researchers worldwide to mine their own acoustic datasets and accelerate the search for patterns in dolphin communication. This collaboration between Google, Georgia Tech, and WDP is opening exciting new possibilities for understanding intelligent marine mammals.
Source: https://blog.google/technology/ai/dolphingemma