A new Chinese company’s AI chatbot, DeepSeek, has shaken the tech industry with its efficiency and affordability. The app’s “large language model” (LLM) boasts reasoning capabilities comparable to US models like OpenAI’s o1, yet requires a fraction of the cost to train and run.
DeepSeek claims to have achieved this by implementing several technical strategies that reduced computation time and memory requirements. Its base model V3 required 2.788 million hours to train at an estimated cost of under $6m, significantly less than OpenAI’s GPT-4, which cost over $100m.
The DeepSeek models were trained on modified Nvidia H800 GPUs, possibly stockpiled before export restrictions tightened in October 2023. This constraint likely drove the company to innovate and reduce costs. By reducing computational overheads, DeepSeek aims to address environmental concerns related to AI’s high energy consumption.
The latest model’s “weights” have been openly released, along with a technical paper, allowing other groups to run the model on their own equipment and adapt it to other tasks. Researchers are now able to peer beneath the model’s bonnet, analyzing its strengths and weaknesses.
While DeepSeek’s cost-cutting techniques are not new, its approach is gaining attention worldwide. The company’s success may lead to more efficient AI models being developed with fewer resources. This shift could benefit smaller companies and drive demand for AI products, ultimately benefiting businesses like Nvidia in the long term.
As the tech industry continues to evolve, DeepSeek’s emergence as a game-changer highlights the potential of smaller companies to create innovative AI tools that can make lives easier.
Source: https://www.bbc.com/future/article/20250131-what-does-deepseeks-new-app-mean-for-the-future-of-ai