carlini / yet-another-applied-llm-benchmarkLinks
A benchmark to evaluate language models on questions I've previously asked them to solve.
☆1,011Updated last month
Alternatives and similar repositories for yet-another-applied-llm-benchmark
Users that are interested in yet-another-applied-llm-benchmark are comparing it to the libraries listed below
Sorting:
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆793Updated last month
- Fine-tune mistral-7B on 3090s, a100s, h100s☆711Updated last year
- System 2 Reasoning Link Collection☆835Updated 2 months ago
- Automatically evaluate your LLMs in Google Colab☆629Updated last year
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,390Updated this week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,574Updated this week
- ☆447Updated last year
- Generate textbook-quality synthetic LLM pretraining data☆498Updated last year
- ☆517Updated 6 months ago
- ☆722Updated last week
- A library for making RepE control vectors☆593Updated 4 months ago
- A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…☆614Updated 2 months ago
- ☆412Updated last year
- Inspect: A framework for large language model evaluations☆979Updated this week
- ☆893Updated 8 months ago
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,384Updated last year
- YaRN: Efficient Context Window Extension of Large Language Models☆1,489Updated last year
- A tool for evaluating LLMs☆418Updated last year
- Evaluate your LLM's response with Prometheus and GPT4 💯☆948Updated last month
- What would you do with 1000 H100s...☆1,048Updated last year
- LLM Analytics☆664Updated 7 months ago
- Optimizing inference proxy for LLMs☆2,427Updated this week
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling☆876Updated last month
- Minimalistic large language model 3D-parallelism training☆1,898Updated this week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,724Updated this week
- ☆536Updated 9 months ago
- Recipes to scale inference-time compute of open models☆1,087Updated last week
- DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤☆1,018Updated 4 months ago
- ☆1,024Updated 5 months ago
- Reaching LLaMA2 Performance with 0.1M Dollars☆979Updated 10 months ago