huggingface / yourbenchLinks
🤗 Benchmark Large Language Models Reliably On Your Data
☆419Updated last week
Alternatives and similar repositories for yourbench
Users that are interested in yourbench are comparing it to the libraries listed below
Sorting:
- Build datasets using natural language☆556Updated 3 months ago
- Collection of scripts and notebooks for OpenAI's latest GPT OSS models☆486Updated 4 months ago
- awesome synthetic (text) datasets☆315Updated last month
- Simple UI for debugging correlations of text embeddings☆306Updated 7 months ago
- Automatically evaluate your LLMs in Google Colab☆677Updated last year
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆348Updated 6 months ago
- A small library of LLM judges☆311Updated 4 months ago
- ☆138Updated 4 months ago
- A Lightweight Library for AI Observability☆252Updated 10 months ago
- An Open Source Toolkit For LLM Distillation☆814Updated last week
- Banishing LLM Hallucinations Requires Rethinking Generalization☆276Updated last year
- A compact LLM pretrained in 9 days by using high quality data☆337Updated 8 months ago
- An interface library for RL post training with environments.☆859Updated this week
- ☆235Updated last month
- ☆693Updated 8 months ago
- ☆159Updated 8 months ago
- ☆120Updated last year
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input☆933Updated 6 months ago
- ☆160Updated last year
- Fast Semantic Text Deduplication & Filtering☆859Updated 2 months ago
- Late Interaction Models Training & Retrieval☆677Updated 2 weeks ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆312Updated last year
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆453Updated last year
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle☆300Updated 2 weeks ago
- Let's build better datasets, together!☆267Updated last year
- Tutorial for building LLM router☆239Updated last year
- A library for easily merging multiple LLM experts, and efficiently train the merged LLM.☆500Updated last year
- An open-source tool for LLM prompt optimization.☆734Updated last week
- Automatic evals for LLMs☆569Updated last week
- Code to accompany the Universal Deep Research paper (https://arxiv.org/abs/2509.00244)☆455Updated 4 months ago