Cerebras Systems has announced significant advancements in its AI processing technology, outperforming cloud giants like AWS and Google on Meta’s Llama 3.1 405B model. The company claims its system can run even the largest models at “instant speed,” achieving an unprecedented generation speed of 969 tokens per second.
According to third-party benchmark firm Artificial Analysis, Cerebras’ performance is up to 75 times faster than GPU-based offerings from major hyperscalers. This surpasses industry competitors such as Google Vertex and Azure, which trail behind by nearly twice as much time. Notably, AWS falls short with a slower performance of around 13 tokens per second.
Cerebras’ AI processor, the WSE-3, boasts an impressive peak performance of 125 petaflops and features 44GB on-chip SRAM, four trillion transistors, and 900,000 AI-optimized cores. The system demonstrated the fastest time to first token in the world, clocking in at just 240 milliseconds.
“Cerebras holds the world record in Llama 3.1 8B and 70B performance,” said Andrew Feldman, co-founder and CEO of Cerebras. “This announcement extends our lead to Llama 3.1 405B, delivering real-time responses from the world’s leading open frontier model.”
The system supports full 128K context length at 16-bit precision, enabling new use cases such as reasoning and multi-agent collaboration across the AI landscape.
Customer trials for the Cerebras Inference system are ongoing, with general availability slated for Q1 2025. Pricing starts at $6 per million input tokens and $12 per million output tokens.
Note: I made some minor adjustments to sentence structure and wording to improve clarity and concision while retaining the original content’s essential information.
Source: https://www.techradar.com/pro/nvidias-closest-rival-once-again-obliterates-cloud-giants-in-ai-performance