huggingface / evaluation-guidebookLinks
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
β1,456Updated 6 months ago
Alternatives and similar repositories for evaluation-guidebook
Users that are interested in evaluation-guidebook are comparing it to the libraries listed below
Sorting:
- A reading list on LLM based Synthetic Data Generation π₯β1,328Updated last month
- β673Updated 2 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β2,796Updated last week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,684Updated last week
- Textbook on reinforcement learning from human feedbackβ1,083Updated this week
- β1,857Updated last week
- Synthetic data curation for post-training and structured data extractionβ1,434Updated this week
- Tool for generating high quality Synthetic datasetsβ985Updated last week
- Best practices for distilling large language models.β561Updated last year
- Verifiers for LLM Reinforcement Learningβ1,495Updated this week
- Curated list of datasets and tools for post-training.β3,244Updated 5 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,492Updated last month
- Evaluate your LLM's response with Prometheus and GPT4 π―β959Updated 2 months ago
- System 2 Reasoning Link Collectionβ841Updated 3 months ago
- A collection of notebooks/recipes showcasing usecases of open-source models with Together AI.β970Updated last week
- Automatically evaluate your LLMs in Google Colabβ646Updated last year
- π€ Benchmark Large Language Models Reliably On Your Dataβ354Updated last week
- Inspect: A framework for large language model evaluationsβ1,134Updated this week
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ1,507Updated last week
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.β2,719Updated this week
- Bringing BERT into modernity via both architecture changes and scalingβ1,430Updated last week
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.β801Updated last month
- Build datasets using natural languageβ498Updated 2 months ago
- A library for advanced large language model reasoningβ2,174Updated last month
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,441Updated 2 weeks ago
- Recipes to scale inference-time compute of open modelsβ1,101Updated last month
- Implementing the 4 agentic patterns from scratchβ1,404Updated 3 months ago
- Fast State-of-the-Art Static Embeddingsβ1,752Updated last month
- β1,025Updated 6 months ago
- Stanford NLP Python library for Representation Finetuning (ReFT)β1,493Updated 5 months ago