epfml / llm-baselines
nanoGPT-like codebase for LLM training
โ83Updated this week
Alternatives and similar repositories for llm-baselines:
Users that are interested in llm-baselines are comparing it to the libraries listed below
- โ51Updated 7 months ago
- A MAD laboratory to improve AI architecture designs ๐งชโ102Updated last month
- Understand and test language model architectures on synthetic tasks.โ175Updated this week
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"โ66Updated 2 months ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)โ80Updated last year
- Code for studying the super weight in LLMโ68Updated last month
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.โ61Updated 5 months ago
- โ82Updated 11 months ago
- Language models scale reliably with over-training and on downstream tasksโ96Updated 9 months ago
- โ50Updated 3 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ90Updated last month
- โ114Updated last year
- โ37Updated 9 months ago
- โ41Updated this week
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amountโฆโ51Updated last year
- โ74Updated last year
- โ72Updated 8 months ago
- Open source replication of Anthropic's Crosscoders for Model Diffingโ28Updated 2 months ago
- โ135Updated this week
- โ164Updated last year
- โ75Updated 6 months ago
- Universal Neurons in GPT2 Language Modelsโ27Updated 7 months ago
- โ168Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"โ219Updated last month
- PyTorch library for Active Fine-Tuningโ52Updated last week
- Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvatureโ120Updated 5 months ago
- โ53Updated 2 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).โ176Updated last month
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]โ58Updated 3 months ago
- Sparse and discrete interpretability tool for neural networksโ58Updated 11 months ago