evanmiller / LLM-Reading-List
LLM papers I'm reading, mostly on inference and model compression
☆694Updated 10 months ago
Related projects ⓘ
Alternatives and complementary repositories for LLM-Reading-List
- What would you do with 1000 H100s...☆903Updated 10 months ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆715Updated last month
- ☆470Updated 2 months ago
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆334Updated 3 months ago
- An ML Systems Onboarding list☆545Updated this week
- Fine-tune mistral-7B on 3090s, a100s, h100s☆702Updated last year
- Fast & Simple repository for pre-training and fine-tuning T5-style models☆970Updated 3 months ago
- Puzzles for learning Triton☆1,135Updated this week
- Best practices for distilling large language models.☆397Updated 9 months ago
- A bibliography and survey of the papers surrounding o1☆754Updated this week
- Alex Krizhevsky's original code from Google Code☆190Updated 8 years ago
- Finetuning Large Language Models on One Consumer GPU in 2 Bits☆707Updated 5 months ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,149Updated last month
- ☆860Updated 11 months ago
- Puzzles for exploring transformers☆325Updated last year
- A comprehensive deep dive into the world of tokens☆214Updated 4 months ago
- System 2 Reasoning Link Collection☆693Updated 3 weeks ago
- ☆411Updated last year
- Official implementation of Half-Quadratic Quantization (HQQ)☆701Updated last week
- Serving multiple LoRA finetuned LLM as one☆984Updated 6 months ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,391Updated 8 months ago
- The Tensor (or Array)☆411Updated 3 months ago
- NanoGPT (124M) quality in 7.8 8xH100-minutes☆1,033Updated this week
- High Quality Resources on GPU Programming/Architecture☆566Updated 3 months ago
- batched loras☆336Updated last year
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆157Updated last year
- Following master Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish☆167Updated 3 months ago
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆675Updated 7 months ago
- a small code base for training large models☆266Updated 2 weeks ago
- 🤖 A PyTorch library of curated Transformer models and their composable components☆866Updated 7 months ago