keli-wen / AGI-Study
The blog, read report and code example for AGI/LLM related knowledge.
β18Updated 3 months ago
Related projects β
Alternatives and complementary repositories for AGI-Study
- Multi-Candidate Speculative Decodingβ28Updated 6 months ago
- Awesome-LLM-KV-Cache: A curated list of πAwesome LLM KV Cache Papers with Codes.β93Updated this week
- MagicPIG: LSH Sampling for Efficient LLM Generationβ44Updated last week
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**β135Updated 5 months ago
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)β184Updated 2 weeks ago
- Awesome list for LLM quantizationβ122Updated 3 weeks ago
- The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inferenceβ35Updated this week
- β63Updated 3 months ago
- β149Updated 2 weeks ago
- The official code for paper "parallel speculative decoding with adaptive draft length."β23Updated 2 months ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inferenceβ353Updated last week
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.β36Updated this week
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of β¦β98Updated 4 months ago
- ATC23 AEβ43Updated last year
- A repository sharing the literatures about long-context large language models, including the methodologies and the evaluation benchmarksβ253Updated 3 months ago
- [NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.β387Updated 3 months ago
- π° Must-read papers on KV Cache Compression (constantly updating π€).β120Updated this week
- A collection of 150+ surveys on LLMsβ203Updated last month
- [USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paralβ¦β46Updated 3 months ago
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.β90Updated 9 months ago
- pytorch distribute tutorialsβ81Updated last month
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ195Updated last week
- β36Updated last month
- A MoE impl for PyTorch, [ATC'23] SmartMoEβ57Updated last year
- The related works and background techniques about Openai o1β137Updated this week
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.β79Updated 5 months ago
- β42Updated 7 months ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free toβ¦β48Updated last year
- β18Updated 3 months ago
- β198Updated this week