Large language models have shown impressive capabilities in writing poetry or generating viable computer programs, but research suggests that their ability to navigate complex environments is not as robust as it seems.
A new study found that a popular type of generative AI model can provide turn-by-turn driving directions in New York City with near-perfect accuracy without forming an accurate internal map of the city. However, when the researchers added detours or closed some streets, its performance plummeted.
The researchers discovered that the generated maps contained many nonexistent streets curving between the grid and connecting far away intersections. This limitation has serious implications for deploying generative AI models in the real world, as a model’s performance can deteriorate rapidly when faced with slight changes in the task or environment.
To evaluate the coherence of a transformer’s world model, the researchers developed two new metrics: sequence distinction and sequence compression. These metrics tested whether a transformer can recognize differences between states and understand the rules governing a particular problem.
Interestingly, transformers trained on data generated from randomly produced sequences formed more accurate world models than those trained on data following strategies. This may be due to the fact that random choices led to a wider variety of potential next steps during training.
Despite the impressive performance of these models, only one transformer was able to generate a coherent world model for Othello moves, while none performed well in forming coherent world models for wayfinding examples.
The researchers demonstrated the limitations of generative AI models by adding detours to the map of New York City, which caused all navigation models to fail. The recovered city maps often contained errors, such as random flyovers or streets with impossible orientations.
To capture accurate world models, scientists need to take a different approach when building large language models. The researchers hope that their evaluation metrics and findings will encourage others to think carefully about the limitations of these models and avoid relying solely on intuition when assessing their capabilities.
Source: https://news.mit.edu/2024/generative-ai-lacks-coherent-world-understanding-1105