♾️ Thinking

Wop

wop

65 27 396

https://cosmos-vb.netlify.app/

koo1140

AI & ML interests

AI research AGI

Recent Activity

upvoted a changelog about 5 hours ago

MCP Server Enhancements

liked a dataset about 9 hours ago

wop/Ultra-distill

repliedto their post about 9 hours ago

# One Script, Every Benchmark Every Bench Labs benchmark had its own eval files. Now there's one script, hosted in the leaderboard Space: ```bash curl -sLO https://huggingface.co/spaces/bench-labs/BenchLabs-Leaderboard/resolve/main/script.py pip install torch transformers python script.py --model your/model ``` It runs the full suite with the official scoring — Effortless (exact-match), Easy (hybrid category-aware), Mid (loglikelihood: `acc`, `acc_norm`, `soft_score_norm`) — and reports every category and subcategory, not just one number. Output includes `leaderboard.json`: a ready-to-paste `models.json` entry. Run the script, paste it, open a PR on the [leaderboard](https://huggingface.co/spaces/bench-labs/BenchLabs-Leaderboard). Done. https://huggingface.co/bench-labs

View all activity

Organizations

Posts 3

Post

122

# One Script, Every Benchmark

Every Bench Labs benchmark had its own eval files. Now there's one script, hosted in the leaderboard Space:

curl -sLO https://huggingface.co/spaces/bench-labs/BenchLabs-Leaderboard/resolve/main/script.py
pip install torch transformers
python script.py --model your/model

It runs the full suite with the official scoring — Effortless (exact-match), Easy (hybrid category-aware), Mid (loglikelihood: acc, acc_norm, soft_score_norm) — and reports every category and subcategory, not just one number.

Output includes leaderboard.json: a ready-to-paste models.json entry. Run the script, paste it, open a PR on the [leaderboard]( bench-labs/BenchLabs-Leaderboard). Done.

bench-labs