AMD has introduced its first series of fully open-source large language models (LLMs), called AMD OLMo, which boasts a massive 1 billion parameters and offers strong reasoning capabilities. The models are pre-trained on AMD’s Instinct MI250 GPUs and can be deployed both in datacenters and on personal devices equipped with neural processing units (NPUs).
The development of AMD OLMo is part of the company’s efforts to improve its position in the AI industry and provide clients with tools to deploy these open-source models using AMD hardware. By making the training recipes, code, and data available, AMD aims to empower developers to build upon these models for further innovation.
The AMD OLMo models were trained on a vast dataset of 1.3 trillion tokens across multiple nodes, each equipped with four Instinct MI250 GPUs. The model lineup consists of three stages:
* Initial pre-trained model focused on next-token prediction
* Supervised fine-tuning stage refined the model’s instruction-following capabilities
* Final version aligned to human preferences using Direct Preference Optimization
AMD’s OLMo models have demonstrated impressive performance in standard benchmarks, outperforming similarly sized open-source models. The two-phase supervised fine-tuning model showed significant accuracy improvements, with a 5.09% increase in MMLU scores and a 15.32% gain in GSM8k. The final version also outperformed other chat models by at least 2.60% on average across benchmarks.
Furthermore, AMD’s OLMo models have been tested on responsible AI tasks, such as toxic language detection and bias evaluation, where they performed on par with similar models.
Source: https://www.tomshardware.com/tech-industry/artificial-intelligence/amd-unveils-amd-olmo-its-first-1b-parameter-llm-with-strong-reasoning