A benchmark to evaluate language models on questions I've previously asked them to solve.
☆1,042Apr 27, 2025Updated 10 months ago
Alternatives and similar repositories for yet-another-applied-llm-benchmark
Users that are interested in yet-another-applied-llm-benchmark are comparing it to the libraries listed below
Sorting:
- RuLES: a benchmark for evaluating rule-following in language models☆249Feb 24, 2025Updated last year
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.☆10,358Jul 1, 2024Updated last year
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆6,185Aug 22, 2025Updated 6 months ago
- A framework for few-shot evaluation of language models.☆11,618Updated this week
- Fast bare-bones BPE for modern tokenizer training☆176Jun 23, 2025Updated 8 months ago
- DSPy: The framework for programming—not prompting—language models☆32,519Updated this week
- Structured Outputs☆13,488Mar 2, 2026Updated last week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,915Updated this week
- Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""☆3,923Nov 25, 2024Updated last year
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,868May 17, 2025Updated 9 months ago
- Tools for merging pretrained large language models.☆6,842Feb 28, 2026Updated last week
- Robust recipes to align language models with human and AI preferences☆5,510Sep 8, 2025Updated 6 months ago
- Go ahead and axolotl questions☆11,395Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆595Aug 12, 2025Updated 6 months ago
- structured outputs for llms☆12,468Feb 25, 2026Updated last week
- What would you do with 1000 H100s...☆1,155Jan 10, 2024Updated 2 years ago
- The official PyTorch implementation of Google's Gemma models☆5,606May 30, 2025Updated 9 months ago
- Schedule-Free Optimization in PyTorch☆2,262May 21, 2025Updated 9 months ago
- Minimalistic large language model 3D-parallelism training☆2,588Feb 19, 2026Updated 2 weeks ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆2,324Updated this week
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆13,206Mar 1, 2026Updated last week
- Training LLMs with QLoRA + FSDP☆1,538Nov 9, 2024Updated last year
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆8,902May 3, 2024Updated last year
- Tile primitives for speedy kernels☆3,202Feb 24, 2026Updated last week
- Our library for RL environments + evals☆3,877Updated this week
- PyTorch native post-training library☆5,697Updated this week
- A guidance language for controlling large language models.☆21,333Feb 13, 2026Updated 3 weeks ago
- Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models …☆2,702Updated this week
- ☆4,390Jul 31, 2025Updated 7 months ago
- SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersec…☆18,655Mar 2, 2026Updated last week
- A bibliography and survey of the papers surrounding o1☆1,212Nov 16, 2024Updated last year
- The official implementation of Self-Play Fine-Tuning (SPIN)☆1,235May 8, 2024Updated last year
- Inspect: A framework for large language model evaluations☆1,800Updated this week
- NanoGPT (124M) in 2 minutes☆4,734Feb 27, 2026Updated last week
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,892Oct 28, 2025Updated 4 months ago
- A language for constraint-guided and efficient LLM programming.☆4,155May 22, 2025Updated 9 months ago
- Fine-tune mistral-7B on 3090s, a100s, h100s☆724Oct 11, 2023Updated 2 years ago
- AllenAI's post-training codebase☆3,614Updated this week
- Entropy Based Sampling and Parallel CoT Decoding☆3,432Nov 13, 2024Updated last year