NoakLiu / LLMEasyQuantLinks
A Serving System for Distributed and Parallel LLM Quantization [Efficient ML System]
☆24Updated 2 months ago
Alternatives and similar repositories for LLMEasyQuant
Users that are interested in LLMEasyQuant are comparing it to the libraries listed below
Sorting:
- GraphSnapShot: Caching Local Structure for Fast Graph Learning [Efficient ML System]☆35Updated 2 months ago
- Adaptive Topology Reconstruction for Robust Graph Representation Learning [Efficient ML Model]☆10Updated 6 months ago
- [arXiv 2025] Efficient Reasoning Models: A Survey☆259Updated this week
- TransMLA: Multi-Head Latent Attention Is All You Need☆349Updated this week
- Efficient Foundation Model Design: A Perspective From Model and System Co-Design [Efficient ML System & Model]☆25Updated 6 months ago
- My learning notes/codes for ML SYS.☆3,515Updated this week
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆525Updated last month
- [ICML 2024] PyTorch implementation for "Diversified Batch Selection for Training Acceleration"☆10Updated last year
- [TKDE'25] The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".☆412Updated last month
- Official Repo for Open-Reasoner-Zero☆2,027Updated 3 months ago
- Accelerating Multitask Training Trough Adaptive Transition [Efficient ML Model]☆12Updated 3 months ago
- Paper list for Efficient Reasoning.☆642Updated this week
- Simple RL training for reasoning☆3,733Updated last month
- Official implementation of the ICLR paper "Streamlining Redundant Layers to Compress Large Language Models"☆31Updated 4 months ago
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆839Updated 5 months ago
- Large Language Model (LLM) Systems Paper List☆1,481Updated this week
- Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper☆735Updated 3 weeks ago
- 🚀 Efficient implementations of state-of-the-art linear attention models☆3,091Updated last week
- Awesome RL-based LLM Reasoning☆613Updated last month
- Official code implementation for 2025 ICLR accepted paper "Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"☆39Updated 5 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆126Updated last month
- A collection of AWESOME things about mixture-of-experts☆1,197Updated 8 months ago
- An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models☆1,862Updated last week
- Distributed RL System for LLM Reasoning☆2,546Updated this week
- Survey Paper List - Efficient LLM and Foundation Models☆255Updated 11 months ago
- An Open-source RL System from ByteDance Seed and Tsinghua AIR☆1,532Updated 3 months ago
- Paper List of Inference/Test Time Scaling/Computing☆301Updated last week
- [TMLR 2025] Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models☆591Updated last week
- A Telegram bot to recommend arXiv papers☆280Updated 4 months ago
- Reproduce R1 Zero on Logic Puzzle☆2,394Updated 5 months ago