stanford-cs336 / spring2024-lectures
☆218Updated 2 months ago
Alternatives and similar repositories for spring2024-lectures:
Users that are interested in spring2024-lectures are comparing it to the libraries listed below
- ☆81Updated 5 months ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆255Updated last year
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆162Updated 3 months ago
- ☆145Updated 2 weeks ago
- Building blocks for foundation models.☆456Updated last year
- A brief and partial summary of RLHF algorithms.☆115Updated this week
- The official evaluation suite and dynamic data release for MixEval.☆231Updated 3 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆275Updated last week
- ☆146Updated last year
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆226Updated last week
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆175Updated 7 months ago
- A bibliography and survey of the papers surrounding o1☆1,174Updated 3 months ago
- LLM-Merging: Building LLMs Efficiently through Merging☆191Updated 5 months ago
- LoRA and DoRA from Scratch Implementations☆197Updated last year
- Scaling Data-Constrained Language Models☆334Updated 5 months ago
- What would you do with 1000 H100s...☆1,011Updated last year
- ☆172Updated last year
- A Survey on Data Selection for Language Models☆213Updated 4 months ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆400Updated 10 months ago
- Direct Preference Optimization from scratch in PyTorch☆84Updated last year
- Explorations into some recent techniques surrounding speculative decoding☆245Updated 2 months ago
- Best practices & guides on how to write distributed pytorch training code☆359Updated last week
- ☆394Updated 7 months ago
- ☆272Updated 5 months ago
- Distributed training (multi-node) of a Transformer model☆54Updated 10 months ago
- A Telegram bot to recommend arXiv papers☆249Updated 3 weeks ago
- RewardBench: the first evaluation tool for reward models.☆518Updated this week
- ring-attention experiments☆126Updated 4 months ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆452Updated 11 months ago