huggingface / evaluation-guidebookLinks
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
β2,029Updated last month
Alternatives and similar repositories for evaluation-guidebook
Users that are interested in evaluation-guidebook are comparing it to the libraries listed below
Sorting:
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ2,251Updated this week
- A reading list on LLM based Synthetic Data Generation π₯β1,496Updated 7 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β3,039Updated 3 weeks ago
- Textbook on reinforcement learning from human feedbackβ1,396Updated this week
- β695Updated 8 months ago
- Synthetic data curation for post-training and structured data extractionβ1,594Updated last week
- Best practices for distilling large language models.β599Updated last year
- Tool for generating high quality Synthetic datasetsβ1,463Updated 2 months ago
- β2,138Updated 3 weeks ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,812Updated this week
- Curated list of datasets and tools for post-training.β4,149Updated 2 months ago
- π€ Benchmark Large Language Models Reliably On Your Dataβ423Updated 2 weeks ago
- Our library for RL environments + evalsβ3,730Updated this week
- Doing simple retrieval from LLM models at various context lengths to measure accuracyβ2,137Updated last year
- Minimalistic large language model 3D-parallelism trainingβ2,411Updated last month
- System 2 Reasoning Link Collectionβ867Updated 9 months ago
- A collection of notebooks/recipes showcasing usecases of open-source models with Together AI.β1,089Updated this week
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,587Updated 3 weeks ago
- Bringing BERT into modernity via both architecture changes and scalingβ1,607Updated 6 months ago
- Automatically evaluate your LLMs in Google Colabβ679Updated last year
- Evaluate your LLM's response with Prometheus and GPT4 π―β1,029Updated 8 months ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.β829Updated 5 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipyβ1,447Updated 3 weeks ago
- β1,334Updated 10 months ago
- Recipes to scale inference-time compute of open modelsβ1,123Updated 7 months ago
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ1,826Updated 2 months ago
- Stanford NLP Python library for Representation Finetuning (ReFT)β1,551Updated this week
- Inspect: A framework for large language model evaluationsβ1,644Updated this week
- AllenAI's post-training codebaseβ3,515Updated this week
- DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. β π€π€β1,088Updated 11 months ago