huggingface / evaluation-guidebook
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
β1,019Updated last month
Alternatives and similar repositories for evaluation-guidebook:
Users that are interested in evaluation-guidebook are comparing it to the libraries listed below
- A reading list on LLM based Synthetic Data Generation π₯β1,141Updated 3 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,160Updated this week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β2,431Updated last week
- β609Updated 2 months ago
- System 2 Reasoning Link Collectionβ793Updated 2 weeks ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,292Updated last week
- β1,480Updated this week
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipyβ1,017Updated last month
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.β732Updated 3 weeks ago
- Fast State-of-the-Art Static Embeddingsβ1,060Updated this week
- Evaluate your LLM's response with Prometheus and GPT4 π―β870Updated last month
- Synthetic Data curation for post-training and structured data extractionβ801Updated this week
- The code used to train and run inference with the ColPali architecture.β1,480Updated this week
- Implementing the 4 agentic patterns from scratchβ1,023Updated 3 weeks ago
- Curated list of datasets and tools for post-training.β2,687Updated 3 weeks ago
- awesome synthetic (text) datasetsβ261Updated 3 months ago
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ1,187Updated 2 weeks ago
- β1,004Updated 2 months ago
- Framework for enhancing LLMs for RAG tasks using fine-tuning.β707Updated this week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,230Updated last week
- LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR tβ¦β372Updated last week
- Best practices for distilling large language models.β476Updated last year
- Stanford NLP Python library for Representation Finetuning (ReFT)β1,418Updated last week
- AdalFlow: The library to build & auto-optimize LLM applications.β2,736Updated this week
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.β2,085Updated 3 weeks ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.β772Updated 3 weeks ago
- Recipes to scale inference-time compute of open modelsβ1,000Updated last month
- OO for LLMsβ634Updated last week
- Fast Semantic Text Deduplicationβ525Updated this week
- Automatically evaluate your LLMs in Google Colabβ590Updated 9 months ago