huggingface / evaluation-guidebookLinks

Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

☆1,498

Alternatives and similar repositories for evaluation-guidebook

Users that are interested in evaluation-guidebook are comparing it to the libraries listed below

Sorting:

wasiahmad / Awesome-LLM-Synthetic-Data
A reading list on LLM based Synthetic Data Generation 🔥
☆1,369Updated last month
huggingface / lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
☆1,766Updated this week
huggingface / huggingface-llama-recipes
☆677Updated 3 months ago
argilla-io / distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆2,821Updated this week
meta-llama / synthetic-data-kit
Tool for generating high quality Synthetic datasets
☆1,081Updated last week
natolambert / rlhf-book
Textbook on reinforcement learning from human feedback
☆1,137Updated last week
bespokelabsai / curator
Synthetic data curation for post-training and structured data extraction
☆1,464Updated 3 weeks ago
predibase / llm_distillation_playbook
Best practices for distilling large language models.
☆568Updated last year
huggingface / datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆2,516Updated this week
prometheus-eval / prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 💯
☆974Updated 3 months ago
mlabonne / llm-datasets
Curated list of datasets and tools for post-training.
☆3,295Updated 6 months ago
willccbb / verifiers
Verifiers for LLM Reinforcement Learning
☆1,621Updated this week
open-thought / system-2-research
System 2 Reasoning Link Collection
☆848Updated 4 months ago
togethercomputer / together-cookbook
A collection of notebooks/recipes showcasing usecases of open-source models with Together AI.
☆987Updated last week
trotsky1997 / MathBlackBox
☆1,028Updated 7 months ago
mistralai / cookbook
☆1,927Updated this week
huggingface / yourbench
🤗 Benchmark Large Language Models Reliably On Your Data
☆367Updated this week
merveenoyan / smol-vision
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
☆1,540Updated last week
mlabonne / llm-autoeval
Automatically evaluate your LLMs in Google Colab
☆649Updated last year
gkamradt / LLMTest_NeedleInAHaystack
Doing simple retrieval from LLM models at various context lengths to measure accuracy
☆1,951Updated 11 months ago
philschmid / deep-learning-pytorch-huggingface
☆1,254Updated 5 months ago
stanfordnlp / pyreft
Stanford NLP Python library for Representation Finetuning (ReFT)
☆1,500Updated 5 months ago
AnswerDotAI / ModernBERT
Bringing BERT into modernity via both architecture changes and scaling
☆1,469Updated last month
AnswerDotAI / rerankers
A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.
☆1,505Updated 2 months ago
EleutherAI / cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
☆808Updated 2 weeks ago
UKGovernmentBEIS / inspect_ai
Inspect: A framework for large language model evaluations
☆1,179Updated this week
huggingface / search-and-learn
Recipes to scale inference-time compute of open models
☆1,110Updated 2 months ago
rasbt / LLM-workshop-2024
A 4-hour coding workshop to understand how LLMs are implemented and used
☆992Updated 6 months ago
argilla-io / synthetic-data-generator
Build datasets using natural language
☆505Updated 2 months ago
carlini / yet-another-applied-llm-benchmark
A benchmark to evaluate language models on questions I've previously asked them to solve.
☆1,023Updated 3 months ago