huggingface/yourbench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/huggingface/yourbench)

huggingface / yourbench

🤗 Benchmark Large Language Models Reliably On Your Data

☆451

Alternatives and similar repositories for yourbench

Users that are interested in yourbench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sumukshashidhar / yourbench
View on GitHub
Benchmark Large Language Models Reliably On Your Data
☆18Dec 27, 2025Updated 6 months ago
huggingface / lighteval
View on GitHub
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
☆2,486Jun 29, 2026Updated 3 weeks ago
huggingface / evaluation-guidebook
View on GitHub
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…
☆2,127Dec 3, 2025Updated 7 months ago
brendanhogan / completion_tree_view
View on GitHub
☆15Apr 26, 2025Updated last year
meta-llama / synthetic-data-kit
View on GitHub
Tool for generating high quality Synthetic datasets
☆1,617Oct 28, 2025Updated 8 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
OpenPipe / deductive-reasoning
View on GitHub
Train your own SOTA deductive reasoning model
☆111Mar 6, 2025Updated last year
AnswerDotAI / fastdata
View on GitHub
☆160Dec 2, 2024Updated last year
argilla-io / synthetic-data-generator
View on GitHub
Build datasets using natural language
☆586Sep 19, 2025Updated 10 months ago
MinishLab / semhash
View on GitHub
Fast Multimodal Semantic Deduplication & Filtering
☆946May 24, 2026Updated last month
Pleias / Pleias-RAG-Library
View on GitHub
Python library to use Pleias-RAG models
☆72Jul 1, 2026Updated 2 weeks ago
SparkJiao / StructTest
View on GitHub
☆19Jul 24, 2025Updated 11 months ago
huggingface / datatrove
View on GitHub
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆3,214Updated this week
PrimeIntellect-ai / lab-cookbook
View on GitHub
Lab Cookbook
☆37Updated this week
huggingface / smollm
View on GitHub
Everything about the SmolLM and SmolVLM family of models
☆3,849May 26, 2026Updated last month
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
huggingface / feel
View on GitHub
☆15May 26, 2026Updated last month
dnotitia / smoothie-qwen
View on GitHub
A lightweight adjustment tool for smoothing token probabilities in the Qwen models to encourage balanced multilingual generation.
☆106Jul 9, 2025Updated last year
run-llama / human_in_the_loop_workflow_demo
View on GitHub
☆74Sep 27, 2024Updated last year
huggingface / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆37Apr 3, 2026Updated 3 months ago
bminixhofer / tokenkit
View on GitHub
A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.
☆69Jul 6, 2025Updated last year
gradio-app / trackio
View on GitHub
A lightweight, local-first, and free experiment tracking library from Hugging Face 🤗
☆1,584Updated this week
LG-AI-EXAONE / KMMLU-Pro
View on GitHub
☆16Aug 18, 2025Updated 11 months ago
flowaicom / flow-judge
View on GitHub
Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…
☆86Oct 29, 2024Updated last year
huggingface / huggingface-inference-toolkit
View on GitHub
Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.
☆94May 28, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
huggingface / gpt-oss-recipes
View on GitHub
Collection of scripts and notebooks for OpenAI's latest GPT OSS models
☆506Aug 25, 2025Updated 10 months ago
brendanhogan / DeepSeekRL-Extended
View on GitHub
Exploring Applications of GRPO
☆252Aug 25, 2025Updated 10 months ago
QuixiAI / dolphin-logger
View on GitHub
☆107Nov 1, 2025Updated 8 months ago
HAE-RAE / haerae-evaluation-toolkit
View on GitHub
The most modern LLM evaluation toolkit
☆70Apr 30, 2026Updated 2 months ago
huggingface / data-is-better-together
View on GitHub
Let's build better datasets, together!
☆273Jun 9, 2026Updated last month
argilla-io / distilabel
View on GitHub
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆3,334Jul 13, 2026Updated last week
KRLabsOrg / rulechef
View on GitHub
Learn rule-based models from examples using LLM-powered synthesis. Replace expensive LLM calls with fast, deterministic, inspectable rege…
☆29Jul 10, 2026Updated last week
xeophon / beam
View on GitHub
☆16Feb 22, 2026Updated 4 months ago
huggingface / smolagents
View on GitHub
🤗 smolagents: a barebones library for agents that think in code.
☆28,449Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
bespokelabsai / curator
View on GitHub
Synthetic data curation for post-training and structured data extraction
☆1,704Jul 13, 2026Updated last week
NVIDIA-NeMo / DataDesigner
View on GitHub
🎨 NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.
☆2,106Updated this week
Essential-AI / eai-taxonomy
View on GitHub
☆59Aug 19, 2025Updated 11 months ago
facebookresearch / ZeroSumEval
View on GitHub
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆35Apr 20, 2025Updated last year
SalesforceAIResearch / PretrainRL-pipeline
View on GitHub
An automated data pipeline scaling RL to pretraining levels
☆76Jun 2, 2026Updated last month
huggingface / hf-endpoints-documentation
View on GitHub
☆27Jun 23, 2026Updated 3 weeks ago
PrimeIntellect-ai / verifiers
View on GitHub
Our library for RL environments + evals
☆4,389Updated this week