π€ Benchmark Large Language Models Reliably On Your Data
β434Apr 2, 2026Updated last week
Alternatives and similar repositories for yourbench
Users that are interested in yourbench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Benchmark Large Language Models Reliably On Your Dataβ18Dec 27, 2025Updated 3 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ2,364Apr 2, 2026Updated last week
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard aβ¦β2,087Dec 3, 2025Updated 4 months ago
- β15Apr 26, 2025Updated 11 months ago
- Tool for generating high quality Synthetic datasetsβ1,554Oct 28, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.β65Jul 6, 2025Updated 9 months ago
- Train your own SOTA deductive reasoning modelβ109Mar 6, 2025Updated last year
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domainsβ50Feb 4, 2026Updated 2 months ago
- β162Dec 2, 2024Updated last year
- Fast Multimodal Semantic Deduplication & Filteringβ909Jan 20, 2026Updated 2 months ago
- Build datasets using natural languageβ573Sep 19, 2025Updated 6 months ago
- β18Dec 2, 2025Updated 4 months ago
- moodistβ25Updated this week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,978Apr 2, 2026Updated last week
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- β19Jul 24, 2025Updated 8 months ago
- Everything about the SmolLM and SmolVLM family of modelsβ3,696Apr 2, 2026Updated last week
- β14Apr 2, 2026Updated last week
- A framework for pitting LLMs against each other in an evolving library of games ββ34Apr 20, 2025Updated 11 months ago
- β74Sep 27, 2024Updated last year
- A lightweight adjustment tool for smoothing token probabilities in the Qwen models to encourage balanced multilingual generation.β104Jul 9, 2025Updated 9 months ago
- Synthetic Text Dataset Generation for LLM projectsβ58Mar 26, 2026Updated 2 weeks ago
- Python library to use Pleias-RAG modelsβ71May 1, 2025Updated 11 months ago
- Exploring Applications of GRPOβ252Aug 25, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β3,155Mar 30, 2026Updated last week
- β24Apr 2, 2026Updated last week
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafteβ¦β84Oct 29, 2024Updated last year
- Async RL Training at Scaleβ1,266Updated this week
- Let's build better datasets, together!β271Apr 2, 2026Updated last week
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.β93Apr 2, 2026Updated last week
- Synthetic data curation for post-training and structured data extractionβ1,654Mar 28, 2026Updated last week
- The LLM Evaluation Frameworkβ14,519Updated this week
- β107Nov 1, 2025Updated 5 months ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- π€ smolagents: a barebones library for agents that think in code.β26,494Apr 2, 2026Updated last week
- A course on aligning smol models.β6,628Updated this week
- β124Updated this week
- Robust recipes to align language models with human and AI preferencesβ5,551Apr 2, 2026Updated last week
- The most modern LLM evaluation toolkitβ70Nov 9, 2025Updated 5 months ago
- A framework for few-shot evaluation of language models.β12,020Apr 1, 2026Updated last week
- awesome synthetic (text) datasetsβ327Jan 8, 2026Updated 3 months ago