NoakLiu / LLMEasyQuantLinks
A Serving System for Distributed and Parallel LLM Quantization [Efficient ML System]
☆22Updated last month
Alternatives and similar repositories for LLMEasyQuant
Users that are interested in LLMEasyQuant are comparing it to the libraries listed below
Sorting:
- GraphSnapShot: Caching Local Structure for Fast Graph Learning [Efficient ML System]☆33Updated last month
- Adaptive Topology Reconstruction for Robust Graph Representation Learning [Efficient ML Model]☆10Updated 5 months ago
- Efficient Foundation Model Design: A Perspective From Model and System Co-Design [Efficient ML System & Model]☆25Updated 4 months ago
- [arXiv 2025] Efficient Reasoning Models: A Survey☆235Updated last week
- Accelerating Multitask Training Trough Adaptive Transition [Efficient ML Model]☆11Updated last month
- Paper list for Efficient Reasoning.☆548Updated 3 weeks ago
- Official implementation of the ICLR paper "Streamlining Redundant Layers to Compress Large Language Models"☆30Updated 2 months ago
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆484Updated 3 weeks ago
- [TKDE'25] The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".☆390Updated 3 weeks ago
- [ICCV'25] The official code implementation of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Langua…☆47Updated last week
- Paper List of Inference/Test Time Scaling/Computing☆280Updated 2 weeks ago
- Official Repo for Open-Reasoner-Zero☆1,990Updated last month
- Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"☆60Updated 2 months ago
- [NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.☆164Updated 9 months ago
- Awesome RL-based LLM Reasoning☆561Updated 2 months ago
- TransMLA: Multi-Head Latent Attention Is All You Need☆329Updated this week
- This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicit…☆1,106Updated 4 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆118Updated last week
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆828Updated 3 weeks ago
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.☆99Updated this week
- Survey Paper List - Efficient LLM and Foundation Models☆252Updated 9 months ago
- OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models☆1,800Updated 6 months ago
- Official code implementation for 2025 ICLR accepted paper "Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"☆36Updated 3 months ago
- Awesome RL Reasoning Recipes ("Triple R")☆745Updated last month
- PyTorch code for our paper "ARB-LLM: Alternating Refined Binarizations for Large Language Models"☆25Updated 3 months ago
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…☆128Updated this week
- [ICML 2024] PyTorch implementation for "Diversified Batch Selection for Training Acceleration"☆10Updated 11 months ago
- A curated list for Efficient Large Language Models☆1,788Updated last month
- The official implementation of "2024NeurIPS Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation"☆46Updated 6 months ago
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)☆19Updated last year