Scientists have created an artificial intelligence (AI) language model called ESM3 that can simulate 500 million years of evolution in protein design. By training the model on tokens generated by evolution, researchers have developed a frontier multimodal generative language model that can generate functional proteins unlike any known ones.
ESM3 follows complex prompts and responds highly to biological alignment, making it an effective tool for simulating evolutionary processes. In one experiment, the team prompted ESM3 to generate fluorescent proteins, resulting in a bright protein with 58% identity to known fluorescent proteins – a distance of over five hundred million years of evolution.
This breakthrough is significant because it suggests that language models can be used to understand the fundamental language of protein biology. By analyzing billions of sequences and structures of proteins, researchers have identified patterns of variation across life, which can be understood using language models.
The ESM3 model is trained at three scales with 1.4B, 7B, and 98B parameters, and its performance improves as the training computational resources increase. The team also generated high-quality, diverse proteins using ESM3, showcasing its potential for simulating evolutionary processes in protein design.
Source: https://astrobiology.com/2025/01/simulating-500-million-years-of-evolution-with-a-language-model.html