evanmiller / LLM-Reading-List
LLM papers I'm reading, mostly on inference and model compression
☆715Updated last year
Alternatives and similar repositories for LLM-Reading-List:
Users that are interested in LLM-Reading-List are comparing it to the libraries listed below
- What would you do with 1000 H100s...☆1,021Updated last year
- ☆512Updated 7 months ago
- 🤖 A PyTorch library of curated Transformer models and their composable components☆883Updated 11 months ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆255Updated last year
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆783Updated 3 weeks ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,218Updated 2 weeks ago
- Llama from scratch, or How to implement a paper without crying☆550Updated 9 months ago
- ☆412Updated last year
- Serving multiple LoRA finetuned LLM as one☆1,040Updated 10 months ago
- A comprehensive deep dive into the world of tokens☆221Updated 9 months ago
- Finetuning Large Language Models on One Consumer GPU in 2 Bits☆719Updated 10 months ago
- An ML Systems Onboarding list☆734Updated 2 months ago
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆343Updated 7 months ago
- A benchmark to evaluate language models on questions I've previously asked them to solve.☆990Updated last month
- Fine-tune mistral-7B on 3090s, a100s, h100s☆709Updated last year
- Fast & Simple repository for pre-training and fine-tuning T5-style models☆1,000Updated 7 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆174Updated last year
- Cramming the training of a (BERT-type) language model into limited compute.☆1,325Updated 9 months ago
- Puzzles for exploring transformers☆333Updated last year
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads☆2,466Updated 9 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆770Updated this week
- batched loras☆340Updated last year
- ☆864Updated last year
- The Art of Debugging☆861Updated 7 months ago
- A repository for research on medium sized language models.☆493Updated 2 months ago
- YaRN: Efficient Context Window Extension of Large Language Models☆1,451Updated 11 months ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,801Updated last year
- A simple and effective LLM pruning approach.☆725Updated 7 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆523Updated last month
- Minimalistic large language model 3D-parallelism training☆1,701Updated this week