Open SLM Leaderboard

A leaderboard for sub-150M parameter language models, evaluated using LM-eval harness or a custom benchmark script available here Arithmark-2.0.

To support our work and help us keep this leaderboard up to date, please consider giving the space a like and follow!

Notable New Releases

Short notes on new models and leaderboard shifts.

12/07/2026 change

Leaderboard Math Changes

After the recent influx of arithmetic specialist models, we have decided to change the way we evaluate math performance on the leaderboard, more details to come.

Leaderboard Change · Math

01/07/2026 new

Atom2.7m posts standout ArithMark 2.0 result

UCR's 2.74M SLM uses custom tokenization and digit features to reach 69.4% ArithMark accuracy, setting the leaderboard's #1 score

UCR · 2.7M params

19/06/2026 new

Axiomic Labs GPT-S2 takes <10M #1 spot

Built on Axiomic Labs' T-X4 refresh-gated XSA stack, GPT-S2-5M pushes into the sub-10M bracket and lands at the top, dethroning SLM-10M.

Axiomic Labs · 5.4M params

16/06/2026 new

Veyra2-Apricot-50M-Base takes the <50M lead

Veyra AI's 49M Apricot checkpoint posts a 37.63 average, setting the strongest score in the sub-50M bracket.

veyra-ai · 49M params

Leaderboard

Zero-shot evaluation. Higher is better for all columns. Click any header to sort.

Compare	# ▼	Model	Params	Avg	Fit Std Devs	HellaSwag	ARC-Easy	ARC-Challenge	PIQA	ArithMark-2

Compare

Models you add from the leaderboard line up side by side here. None selected

Up to 6 models at a time

Scores

Top scores for the active size and benchmark filters.

Top Avg Scores

Efficiency

Score vs parameter count (log scale). Shaded zone = above regression line.

Avg Score vs Log Parameters

Org Leaderboard

Average standard deviations above or below the score-vs-size fit line.

#	Organization	Models	Fit Std Devs	Mean Avg	Best Model vs Fit

Add your model

Open a PR or discussion on this Space with your model's results for the given benchmarks. They will be independently verified by our team and then your PR will be merged. Your model must be open weights to qualify. Open a PR →