A leaderboard for sub-150M parameter language models, evaluated using LM-eval harness or a custom benchmark script available here Arithmark-2.0.
Zero-shot evaluation. Higher is better for all columns. Click any header to sort.
| # ▼ | Model | Params | Avg | HellaSwag | ARC-Easy | ARC-Challenge | PIQA | ArithMark-2 |
|---|
Average score vs parameter count (log scale). Shaded zone = above regression line.
Open a PR on this Space with your model's results for the given benchmarks. They will be independently verified by our team and then your PR will be merged. Your model must be open weights to qualify. Open a PR →