keli-wen / AGI-Study
The blog, read report and code example for AGI/LLM related knowledge.
☆36Updated last month
Alternatives and similar repositories for AGI-Study:
Users that are interested in AGI-Study are comparing it to the libraries listed below
- ☆102Updated last week
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆62Updated 2 weeks ago
- a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation☆43Updated last week
- Pretrain、decay、SFT a CodeLLM from scratch 🧙♂️☆35Updated 10 months ago
- ☆125Updated 3 weeks ago
- Reproducing R1 for Code with Reliable Rewards☆132Updated 3 weeks ago
- ☆139Updated 2 weeks ago
- Multi-Candidate Speculative Decoding☆35Updated 11 months ago
- A Comprehensive Survey on Long Context Language Modeling☆86Updated last week
- The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference☆67Updated 2 months ago
- ☆60Updated 4 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆80Updated last month
- RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.☆48Updated last month
- ☆70Updated 2 weeks ago
- Efficient Mixture of Experts for LLM Paper List☆47Updated 3 months ago
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆151Updated 8 months ago
- ☆65Updated 3 months ago
- A simple calculation for LLM MFU.☆27Updated 3 weeks ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆260Updated 4 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆66Updated 9 months ago
- Implement some method of LLM KV Cache Sparsity☆30Updated 9 months ago
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆40Updated 4 months ago
- The related works and background techniques about Openai o1☆217Updated 2 months ago
- ☆100Updated 11 months ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆176Updated last month
- ☆232Updated 10 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆130Updated 9 months ago
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs☆88Updated last week
- [ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference☆16Updated last week
- qwen-nsa☆42Updated last week