New Mistral 7B LLM outperforms Llama 2 13B on many benchmarks

preview_player

Показать описание

Mistral 7B in short
Mistral 7B is a 7.3B parameter model that:

Outperforms Llama 2 13B on all benchmarks
Outperforms Llama 1 34B on many benchmarks
Approaches CodeLlama 7B performance on code, while remaining good at English tasks
Uses Grouped-query attention (GQA) for faster inference
Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost

Towards AGI

Рекомендации по теме