epfml / llm-baselinesLinks

nanoGPT-like codebase for LLM training

☆102

Alternatives and similar repositories for llm-baselines

Users that are interested in llm-baselines are comparing it to the libraries listed below

Sorting:

EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆149Updated last month
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆123Updated 7 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆221Updated 3 weeks ago
berlino / seq_icl
☆53Updated last year
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆81Updated 9 months ago
insuhan / hyper-attn
☆81Updated last year
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆190Updated last year
cloneofsimo / min-fsdp
☆83Updated last year
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆238Updated 2 months ago
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆97Updated last year
jonhue / activeft
PyTorch library for Active Fine-Tuning
☆87Updated 5 months ago
srush / do-we-need-attention
☆166Updated 2 years ago
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆82Updated 3 years ago
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆82Updated 2 weeks ago
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆54Updated 3 months ago
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆54Updated last year
JonasGeiping / linear_cross_entropy_loss
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆70Updated last year
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆80Updated last year
shikaiqiu / compute-better-spent
☆53Updated 10 months ago
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆84Updated 7 months ago
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆57Updated 9 months ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆130Updated last year
epfml / DenseFormer
☆81Updated last year
wesg52 / universal-neurons
Universal Neurons in GPT2 Language Models
☆30Updated last year
google-deepmind / mishax
☆136Updated 4 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
google-deepmind / asyncdiloco
☆45Updated last year
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆78Updated last year
justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆247Updated last year
srush / mamba-primer
☆37Updated last year