evanmiller / LLM-Reading-List
LLM papers I'm reading, mostly on inference and model compression
☆688Updated 8 months ago
Related projects: ⓘ
- What would you do with 1000 H100s...☆816Updated 8 months ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆662Updated last month
- An ML Systems Onboarding list☆491Updated last month
- Puzzles for learning Triton☆966Updated this week
- ☆442Updated 3 weeks ago
- 🤖 A PyTorch library of curated Transformer models and their composable components☆861Updated 5 months ago
- ☆1,164Updated last week
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,099Updated 7 months ago
- Fine-tune mistral-7B on 3090s, a100s, h100s☆701Updated 11 months ago
- nanoGPT style version of Llama 3.1☆1,162Updated last month
- System 2 Reasoning Link Collection☆597Updated this week
- ☆856Updated 9 months ago
- Llama from scratch, or How to implement a paper without crying☆499Updated 3 months ago
- Curate better data for LLMs☆934Updated 6 months ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,352Updated 6 months ago
- CUDA related news and material links☆1,079Updated 2 weeks ago
- Best practices for distilling large language models.☆370Updated 7 months ago
- A benchmark to evaluate language models on questions I've previously asked them to solve.☆871Updated this week
- Serving multiple LoRA finetuned LLM as one☆946Updated 4 months ago
- Alex Krizhevsky's original code from Google Code☆185Updated 8 years ago
- Minimalistic large language model 3D-parallelism training☆1,111Updated this week
- Building blocks for foundation models.☆345Updated 8 months ago
- Flash Attention in ~100 lines of CUDA (forward pass only)☆558Updated 5 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆659Updated this week
- ☆409Updated 10 months ago
- The repository for the code of the UltraFastBERT paper☆508Updated 5 months ago
- Generate textbook-quality synthetic LLM pretraining data☆479Updated 11 months ago
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆929Updated 2 weeks ago
- YaRN: Efficient Context Window Extension of Large Language Models☆1,306Updated 5 months ago
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆657Updated 5 months ago