huggingface / evaluation-guidebookLinks
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
β1,647Updated last week
Alternatives and similar repositories for evaluation-guidebook
Users that are interested in evaluation-guidebook are comparing it to the libraries listed below
Sorting:
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,973Updated this week
- A reading list on LLM based Synthetic Data Generation π₯β1,423Updated 3 months ago
- Textbook on reinforcement learning from human feedbackβ1,242Updated 2 weeks ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β2,899Updated this week
- β682Updated 5 months ago
- Synthetic data curation for post-training and structured data extractionβ1,511Updated 2 months ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,660Updated this week
- Tool for generating high quality Synthetic datasetsβ1,238Updated this week
- β1,995Updated last week
- Best practices for distilling large language models.β577Updated last year
- Evaluate your LLM's response with Prometheus and GPT4 π―β995Updated 5 months ago
- System 2 Reasoning Link Collectionβ853Updated 6 months ago
- Environments for LLM Reinforcement Learningβ3,222Updated this week
- Curated list of datasets and tools for post-training.β3,735Updated 2 months ago
- π€ Benchmark Large Language Models Reliably On Your Dataβ398Updated this week
- Minimalistic large language model 3D-parallelism trainingβ2,239Updated last month
- Automatically evaluate your LLMs in Google Colabβ661Updated last year
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,543Updated 4 months ago
- Bringing BERT into modernity via both architecture changes and scalingβ1,525Updated 3 months ago
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients. Published in Nature.β2,960Updated 2 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipyβ1,335Updated 3 weeks ago
- AllenAI's post-training codebaseβ3,222Updated this week
- Fast State-of-the-Art Static Embeddingsβ1,846Updated 3 weeks ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracyβ2,036Updated last year
- A collection of notebooks/recipes showcasing usecases of open-source models with Together AI.β1,056Updated last month
- Optimizing inference proxy for LLMsβ2,951Updated this week
- Open-source AI cookbookβ2,276Updated this week
- Recipes to scale inference-time compute of open modelsβ1,109Updated 4 months ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.β813Updated 2 months ago
- Inspect: A framework for large language model evaluationsβ1,350Updated this week