huggingface / evaluation-guidebookLinks
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
β2,007Updated 3 weeks ago
Alternatives and similar repositories for evaluation-guidebook
Users that are interested in evaluation-guidebook are comparing it to the libraries listed below
Sorting:
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ2,202Updated last week
- A reading list on LLM based Synthetic Data Generation π₯β1,491Updated 6 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β2,971Updated last week
- Textbook on reinforcement learning from human feedbackβ1,364Updated this week
- Synthetic data curation for post-training and structured data extractionβ1,577Updated 4 months ago
- β693Updated 7 months ago
- Curated list of datasets and tools for post-training.β4,100Updated last month
- A collection of notebooks/recipes showcasing usecases of open-source models with Together AI.β1,080Updated this week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,780Updated this week
- β2,117Updated this week
- Best practices for distilling large language models.β593Updated last year
- π€ Benchmark Large Language Models Reliably On Your Dataβ419Updated last week
- Our library for RL environments + evalsβ3,655Updated this week
- β1,333Updated 9 months ago
- Minimalistic large language model 3D-parallelism trainingβ2,365Updated last week
- Awesome Reasoning LLM Tutorial/Survey/Guideβ2,221Updated 2 months ago
- System 2 Reasoning Link Collectionβ861Updated 9 months ago
- Tool for generating high quality Synthetic datasetsβ1,427Updated last month
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.β829Updated 4 months ago
- Evaluate your LLM's response with Prometheus and GPT4 π―β1,022Updated 7 months ago
- AllenAI's post-training codebaseβ3,456Updated this week
- Doing simple retrieval from LLM models at various context lengths to measure accuracyβ2,117Updated last year
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,584Updated this week
- Open-source AI cookbookβ2,545Updated last month
- A library for advanced large language model reasoningβ2,318Updated 6 months ago
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ1,810Updated last month
- Fast Semantic Text Deduplication & Filteringβ856Updated last month
- Implementing the 4 agentic patterns from scratchβ1,650Updated 9 months ago
- Recipes to scale inference-time compute of open modelsβ1,120Updated 7 months ago
- Automatically evaluate your LLMs in Google Colabβ675Updated last year