carlini / yet-another-applied-llm-benchmarkView external linksLinks
A benchmark to evaluate language models on questions I've previously asked them to solve.
☆1,042Apr 27, 2025Updated 9 months ago
Alternatives and similar repositories for yet-another-applied-llm-benchmark
Users that are interested in yet-another-applied-llm-benchmark are comparing it to the libraries listed below
Sorting:
- RuLES: a benchmark for evaluating rule-following in language models☆249Feb 24, 2025Updated 11 months ago
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.☆10,309Jul 1, 2024Updated last year
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆6,180Aug 22, 2025Updated 5 months ago
- A framework for few-shot evaluation of language models.☆11,393Updated this week
- Fast bare-bones BPE for modern tokenizer training☆175Jun 23, 2025Updated 7 months ago
- DSPy: The framework for programming—not prompting—language models☆32,156Updated this week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,885Updated this week
- Structured Outputs☆13,403Feb 6, 2026Updated last week
- Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""☆3,920Nov 25, 2024Updated last year
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,852May 17, 2025Updated 8 months ago
- A python script to help manage a Gmail inbox by filtering out promotional emails using GPT-3 or GPT-4.☆458Dec 2, 2023Updated 2 years ago
- Tools for merging pretrained large language models.☆6,783Jan 26, 2026Updated 3 weeks ago
- Robust recipes to align language models with human and AI preferences☆5,495Sep 8, 2025Updated 5 months ago
- Go ahead and axolotl questions☆11,289Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆596Aug 12, 2025Updated 6 months ago
- structured outputs for llms☆12,357Updated this week
- What would you do with 1000 H100s...☆1,153Jan 10, 2024Updated 2 years ago
- The official PyTorch implementation of Google's Gemma models☆5,602May 30, 2025Updated 8 months ago
- Schedule-Free Optimization in PyTorch☆2,256May 21, 2025Updated 8 months ago
- Minimalistic large language model 3D-parallelism training☆2,559Updated this week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆2,293Jan 21, 2026Updated 3 weeks ago
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆13,155Feb 8, 2026Updated last week
- Training LLMs with QLoRA + FSDP☆1,537Nov 9, 2024Updated last year
- Tile primitives for speedy kernels☆3,139Updated this week
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆8,889May 3, 2024Updated last year
- Our library for RL environments + evals☆3,833Updated this week
- PyTorch native post-training library☆5,679Updated this week
- A guidance language for controlling large language models.☆21,270Feb 6, 2026Updated last week
- Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models …☆2,667Feb 9, 2026Updated last week
- SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersec…☆18,478Feb 10, 2026Updated last week
- ☆4,346Jul 31, 2025Updated 6 months ago
- A bibliography and survey of the papers surrounding o1☆1,212Nov 16, 2024Updated last year
- Inspect: A framework for large language model evaluations☆1,737Updated this week
- The official implementation of Self-Play Fine-Tuning (SPIN)☆1,234May 8, 2024Updated last year
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,834Oct 28, 2025Updated 3 months ago
- Simple Byte pair Encoding mechanism used for tokenization process . written purely in C☆146Nov 11, 2024Updated last year
- NanoGPT (124M) in 2 minutes☆4,624Updated this week
- A language for constraint-guided and efficient LLM programming.☆4,148May 22, 2025Updated 8 months ago
- Fine-tune mistral-7B on 3090s, a100s, h100s☆725Oct 11, 2023Updated 2 years ago