Microsoft teases Samba 3.8B, a new SSM superior to the recent Phi3-mini

The Microsoft researchers have uploaded their documentation on GitHub.

News

2 min. read

Published on June 17, 2024

by Claudiu Andone

published on June 17, 2024

Share this article

Readers help support Windows Report. We may get a commission if you buy through our links.

In a time when language models are becoming more complicated and sometimes confusing, Microsoft researchers working with the University of Illinois at Urbana-Champaign have created something simple but powerful: Samba 3.8B.

This is not just another model in this sea of options; it’s like gold that brings together State Space Models (SSMs) and attention mechanisms, showing superiority over other models such as Phi3-mini on main benchmarks. What sets Samba apart? Its ability to handle sequence lengths without limits and maintain a linear time complexity is truly fascinating.

In terms of its architecture, Samba combines Mamba, SwiGLU, and Sliding Window Attention (SWA) layers. The Mamba layers are like strong muscles that catch time-related meanings, while SWA ones deal with tricky non-Markovian dependencies.

When united, they form a high-performance system that can efficiently decode and handle complex dependencies. But wait, there’s more. Adding Multi-Layer Perceptron (MLP) layers, demonstrated as SwiGLU, strengthens the system’s ability to deal with nonlinear transformations and remember factual knowledge.

Introducing Samba 3.8B, a simple Mamba+Sliding Window Attention architecture that outperforms Phi3-mini on major benchmarks (e.g., MMLU, GSM8K and HumanEval) by a large margin.😮 And it has an infinite context length with linear complexity.🤯

Paper: https://t.co/6OnfGG71Aj… pic.twitter.com/f4IZdT1wGB
— Liliang Ren (@liliang_ren) June 12, 2024

It was not just about designing this architecture; the researchers scaled it to a 3.8B parameter model pre-trained on 3.2T tokens. The results were even more impressive, with Samba 3.8B surpassing other open-source language models with parameters up to 8B and excellent performance in various tasks, from commonsense reasoning to coding. Significantly, it has 18.1% more accuracy on GSM8K than Transformer++.

Now, how does this relate to you and me? Initially, Samba’s hybrid architecture could be seen as a potential solution for complex tasks in natural language processing.

The capacity to manage context lengths without limits and exceptional memory extrapolation abilities make it especially fitting for practical uses requiring a profound comprehension of large contexts. Picture a planet where language models grasp our intentions more clearly, enhancing technology’s intuitive and supportive nature.

The researchers’ findings and models are uploaded to GitHub if you want to explore them further. This shows their dedication to progressing language modeling and providing tools for handling upcoming difficulties in the community.

So, if you’re a researcher, developer, or simply someone intrigued by how technology aids us in comprehending language better, Samba 3.8B is an important development to watch.

Leave a Reply Cancel reply