huggingface / evaluation-guidebookLinks
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
β1,564Updated 8 months ago
Alternatives and similar repositories for evaluation-guidebook
Users that are interested in evaluation-guidebook are comparing it to the libraries listed below
Sorting:
- A reading list on LLM based Synthetic Data Generation π₯β1,407Updated 3 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,890Updated this week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β2,878Updated this week
- Textbook on reinforcement learning from human feedbackβ1,205Updated last week
- Synthetic data curation for post-training and structured data extractionβ1,495Updated last month
- β679Updated 4 months ago
- Tool for generating high quality Synthetic datasetsβ1,183Updated last month
- β1,967Updated last week
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,528Updated 3 months ago
- Evaluate your LLM's response with Prometheus and GPT4 π―β981Updated 4 months ago
- Best practices for distilling large language models.β574Updated last year
- System 2 Reasoning Link Collectionβ852Updated 5 months ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,580Updated 2 weeks ago
- Automatically evaluate your LLMs in Google Colabβ658Updated last year
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.β814Updated last month
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipyβ1,309Updated this week
- Doing simple retrieval from LLM models at various context lengths to measure accuracyβ2,004Updated last year
- Stanford NLP Python library for Representation Finetuning (ReFT)β1,511Updated 7 months ago
- π€ Benchmark Large Language Models Reliably On Your Dataβ391Updated last week
- Curated list of datasets and tools for post-training.β3,671Updated last month
- Inspect: A framework for large language model evaluationsβ1,304Updated this week
- β1,279Updated 6 months ago
- Verifiers for LLM Reinforcement Learningβ2,981Updated this week
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients. Published in Nature.β2,902Updated last month
- Fast Semantic Text Deduplication & Filteringβ800Updated this week
- DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. β π€π€β1,057Updated 7 months ago
- Implementing the 4 agentic patterns from scratchβ1,538Updated 5 months ago
- Automated Evaluation of RAG Systemsβ654Updated 5 months ago
- A collection of notebooks/recipes showcasing usecases of open-source models with Together AI.β1,046Updated 3 weeks ago
- Open-source AI cookbookβ2,252Updated this week