π€ Benchmark Large Language Models Reliably On Your Data
β445Apr 2, 2026Updated last month
Alternatives and similar repositories for yourbench
Users that are interested in yourbench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Benchmark Large Language Models Reliably On Your Dataβ18Dec 27, 2025Updated 4 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ2,415May 7, 2026Updated last week
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard aβ¦β2,111Dec 3, 2025Updated 5 months ago
- β15Apr 26, 2025Updated last year
- Tool for generating high quality Synthetic datasetsβ1,583Oct 28, 2025Updated 6 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.β67Jul 6, 2025Updated 10 months ago
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domainsβ49Feb 4, 2026Updated 3 months ago
- Train your own SOTA deductive reasoning modelβ110Mar 6, 2025Updated last year
- β162Dec 2, 2024Updated last year
- Fast Multimodal Semantic Deduplication & Filteringβ926May 4, 2026Updated 2 weeks ago
- Build datasets using natural languageβ575Sep 19, 2025Updated 8 months ago
- β18Dec 2, 2025Updated 5 months ago
- moodistβ28Apr 23, 2026Updated 3 weeks ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β3,058May 6, 2026Updated 2 weeks ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- β19Jul 24, 2025Updated 9 months ago
- β14Apr 8, 2026Updated last month
- Everything about the SmolLM and SmolVLM family of modelsβ3,777Apr 2, 2026Updated last month
- Vibe-coding tools for the LlamaIndex ecosystemβ176Nov 3, 2025Updated 6 months ago
- A framework for pitting LLMs against each other in an evolving library of games ββ34Apr 20, 2025Updated last year
- β74Sep 27, 2024Updated last year
- A lightweight adjustment tool for smoothing token probabilities in the Qwen models to encourage balanced multilingual generation.β105Jul 9, 2025Updated 10 months ago
- A framework for few-shot evaluation of language models.β36Apr 3, 2026Updated last month
- Synthetic Text Dataset Generation for LLM projectsβ58Apr 17, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Python library to use Pleias-RAG modelsβ71May 8, 2026Updated last week
- Exploring Applications of GRPOβ252Aug 25, 2025Updated 8 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β3,217Apr 27, 2026Updated 3 weeks ago
- β26May 7, 2026Updated last week
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafteβ¦β86Oct 29, 2024Updated last year
- Let's build better datasets, together!β272Apr 2, 2026Updated last month
- Agentic RL Training at Scaleβ1,384Updated this week
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.β93Apr 15, 2026Updated last month
- Synthetic data curation for post-training and structured data extractionβ1,675Apr 18, 2026Updated last month
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- β107Nov 1, 2025Updated 6 months ago
- The LLM Evaluation Frameworkβ15,497Updated this week
- π€ smolagents: a barebones library for agents that think in code.β27,255Apr 24, 2026Updated 3 weeks ago
- A course on aligning smol models.β6,643Apr 17, 2026Updated last month
- β124Apr 17, 2026Updated last month
- Robust recipes to align language models with human and AI preferencesβ5,602Apr 8, 2026Updated last month
- The most modern LLM evaluation toolkitβ69Apr 30, 2026Updated 2 weeks ago