NVIDIA’s latest GeForce RTX 5090 has outperformed AMD’s RX 7900 XTX in running high-end language model (LLM) models on the DeepSeek R1. The new fifth-generation Tensor Cores contribute to this advantage, allowing for faster inference performance.
In a series of benchmarks, the GeForce RTX 5090 surpassed the Radeon RX 7900 XTX, achieving up to 200 tokens per second in Distill Qwen 7b and Distill Llama 8b. This is almost twice the performance of AMD’s RX 7900 XTX.
To make it easy for developers to access DeepSeek R1 on NVIDIA GPUs, the company has published a dedicated blog and made the 671-billion-parameter model available as an NVIDIA NIM microservice preview on build.nvidia.com. This allows users to securely experiment with the capabilities and build their own specialized agents.
The DeepSeek-R1 NIM microservice delivers up to 3,872 tokens per second on a single NVIDIA HGX H200 system. Developers can test and experiment with the application programming interface (API), which is expected to be available soon as a downloadable NIM microservice.
With NVIDIA’s NIM, developers and enthusiasts can easily try out the AI model on their local builds, ensuring data security and improved performance when possible.
Source: https://wccftech.com/nvidia-geforce-rtx-5090-dominates-inference-performance-on-deepseeks-r1-ai-models