NVIDIA has unveiled an open-source reasoning model called DeepSeek-R1, designed to deliver accurate and efficient real-time inference capabilities for agentic AI systems. Unlike traditional models that respond directly, DeepSeek-R1 uses a chain-of-thought approach by performing multiple inference passes over a query to generate the best answer.
This process, known as test-time scaling, requires significant compute resources, but the model’s 671-billion parameters and support for industry-standard APIs make it an attractive option for enterprises. The DeepSeek-R1 NIM microservice is now available on build.nvidia.com, allowing developers to experiment with its capabilities and deploy it with ease.
The model’s performance is made possible by the NVIDIA Hopper architecture’s FP8 Transformer Engine, which enables high-throughput inference. Additionally, the upcoming NVIDIA Blackwell architecture promises a significant boost in performance, delivering up to 20 petaflops of peak FP4 compute performance.
Developers can test and experiment with the DeepSeek-R1 NIM microservice using its API, which is expected to be available soon as part of the NVIDIA AI Enterprise software platform. Enterprises can maximize security and data privacy by running the NIM microservice on their preferred accelerated computing infrastructure.
Source: https://blogs.nvidia.com/blog/deepseek-r1-nim-microservice