DeepSeek’s research team revealed the true cost of training its flagship model to $5.87 million, far higher than the $294,000 initially reported. The initial estimate was based on the supplementary information released alongside the original paper, which mentioned 512 GPUs used to train the preliminary R1-Zero release. However, this only accounts for a small part of the total cost.
The real cost includes additional compute resources required for reinforcement learning, a post-training process that imbues the existing V3 base model with “reasoning” capabilities. The paper focused on the application of Group Relative Policy Optimization (GRPO), a specific technique used in DeepSeek’s training.
To put this into perspective, the actual cost is comparable to other AI models like Meta’s Llama 4, which required significantly more training data and tokens but less computing power. This suggests that DeepSeek’s model may not be as efficient or cost-effective as initially claimed.
Source: https://www.theregister.com/2025/09/19/deepseek_cost_train