carlini/yet-another-applied-llm-benchmark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/carlini/yet-another-applied-llm-benchmark)

carlini / yet-another-applied-llm-benchmark

A benchmark to evaluate language models on questions I've previously asked them to solve.

☆1,061

Alternatives and similar repositories for yet-another-applied-llm-benchmark

Users that are interested in yet-another-applied-llm-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

normster / llm_rules
View on GitHub
RuLES: a benchmark for evaluating rule-following in language models
☆255Feb 24, 2025Updated last year
gautierdag / bpeasy
View on GitHub
Fast bare-bones BPE for modern tokenizer training
☆179Jun 23, 2025Updated last year
karpathy / minbpe
View on GitHub
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
☆10,606Jul 1, 2024Updated 2 years ago
google / gemma_pytorch
View on GitHub
The official PyTorch implementation of Google's Gemma models
☆5,707May 30, 2025Updated last year
meta-pytorch / gpt-fast
View on GitHub
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
☆6,228Aug 22, 2025Updated 10 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
isafulf / inbox_cleaner
View on GitHub
A python script to help manage a Gmail inbox by filtering out promotional emails using GPT-3 or GPT-4.
☆465Dec 2, 2023Updated 2 years ago
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆35,904Updated this week
dottxt-ai / outlines
View on GitHub
Structured Outputs
☆14,399Updated this week
ash-01xor / bpe.c
View on GitHub
Simple Byte pair Encoding mechanism used for tokenization process . written purely in C
☆151Nov 11, 2024Updated last year
teichman / teichman-ros-pkg
View on GitHub
☆10Sep 30, 2015Updated 10 years ago
Codium-ai / AlphaCodium
View on GitHub
Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""
☆3,945Nov 25, 2024Updated last year
BobMcDear / attorch
View on GitHub
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆606May 13, 2026Updated last month
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆13,205Jun 24, 2026Updated 2 weeks ago
huggingface / datatrove
View on GitHub
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆3,149Jul 3, 2026Updated last week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
AnswerDotAI / RAGatouille
View on GitHub
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…
☆3,938May 17, 2025Updated last year
srush / LLM-Training-Puzzles
View on GitHub
What would you do with 1000 H100s...
☆1,179Jan 10, 2024Updated 2 years ago
axolotl-ai-cloud / axolotl
View on GitHub
Go ahead and axolotl questions
☆12,170Updated this week
huggingface / alignment-handbook
View on GitHub
Robust recipes to align language models with human and AI preferences
☆5,627May 26, 2026Updated last month
arcee-ai / mergekit
View on GitHub
Tools for merging pretrained large language models.
☆7,211Jun 17, 2026Updated 3 weeks ago
jzhang38 / TinyLlama
View on GitHub
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
☆9,004May 3, 2024Updated 2 years ago
facebookresearch / schedule_free
View on GitHub
Schedule-Free Optimization in PyTorch
☆2,310Jun 18, 2026Updated 3 weeks ago
carlini / pycallcc
View on GitHub
Discount jupyter.
☆52Mar 7, 2025Updated last year
567-labs / instructor
View on GitHub
structured outputs for llms
☆13,420Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,516Jun 15, 2026Updated 3 weeks ago
PrimeIntellect-ai / verifiers
View on GitHub
Our library for RL environments + evals
☆4,289Updated this week
Lightning-AI / litgpt
View on GitHub
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
☆13,469Updated this week
abacaj / fine-tune-mistral
View on GitHub
Fine-tune mistral-7B on 3090s, a100s, h100s
☆734Oct 11, 2023Updated 2 years ago
guidance-ai / guidance
View on GitHub
A guidance language for controlling large language models.
☆21,530May 21, 2026Updated last month
meta-pytorch / torchtune
View on GitHub
PyTorch native post-training library
☆5,780Jul 4, 2026Updated last week
EleutherAI / cookbook
View on GitHub
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
☆844Mar 15, 2026Updated 3 months ago
KellerJordan / modded-nanogpt
View on GitHub
NanoGPT (124M) in 90 seconds
☆5,478Jul 3, 2026Updated last week
huggingface / lighteval
View on GitHub
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
☆2,467Jun 29, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
stanford-crfm / helm
View on GitHub
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models …
☆2,846Jul 1, 2026Updated last week
UKGovernmentBEIS / inspect_ai
View on GitHub
Inspect: A framework for large language model evaluations
☆2,321Updated this week
huggingface / nanotron
View on GitHub
Minimalistic large language model 3D-parallelism training
☆2,737May 26, 2026Updated last month
SWE-agent / SWE-agent
View on GitHub
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersec…
☆19,759Updated this week
srush / awesome-o1
View on GitHub
A bibliography and survey of the papers surrounding o1
☆1,213Nov 16, 2024Updated last year
uclaml / SPIN
View on GitHub
The official implementation of Self-Play Fine-Tuning (SPIN)
☆1,247May 8, 2024Updated 2 years ago
xjdr-alt / entropix
View on GitHub
Entropy Based Sampling and Parallel CoT Decoding
☆3,433Nov 13, 2024Updated last year