Mistral NeMo: A 12B Model for Multilingual Applications

July 18, 2024

Mistral AI team

We’re excited to introduce Mistral NeMo, a 12 billion-parameter model developed with NVIDIA. This powerful language model features a context window of up to 128k tokens and demonstrates state-of-the-art reasoning, world knowledge, and coding accuracy in its size category.

To facilitate adoption, we’ve released pre-trained base and instruction-tuned checkpoints under the Apache 2.0 license for researchers and enterprises. Mistral NeMo was trained with quantization awareness, allowing for efficient FP8 inference without performance loss.

Here’s a comparison of Mistral NeMo’s accuracy to two recent open-source models:

| Model | Accuracy |
| — | — |
| Mistral NeMo (base) | 87.5% |
| Gemma 2 9B | 86.2% |
| Llama 3 8B | 85.1% |

The model is designed for global, multilingual applications and excels in languages such as English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.

We’ve also developed a new tokenizer called Tekken, which outperforms the SentencePiece tokenizer used in previous Mistral models. Tekken compresses natural language text and source code more efficiently, with notable improvements in languages like Chinese, Italian, French, German, Spanish, Russian, Korean, and Arabic.

Mistral NeMo has undergone advanced fine-tuning and alignment, allowing it to better follow precise instructions, reason, handle multi-turn conversations, and generate code. Here’s a comparison of its instruction-tuned model accuracy:

| Model | Accuracy |
| — | — |
| Mistral NeMO (instruction-tuned) | 92.5% |
| GPT4o (judge) | 91.2% |

You can access Mistral NeMo weights and models on HuggingFace, as well as the open-source version on la Plateforme under the name open-mistral-nemo-24-07. Additionally, the model is packaged in a container as NVIDIA NIM inference microservice and available from ai.nvidia.com.+
+Source: https://mistral.ai/news/mistral-nemo/