NoakLiu / LLMEasyQuantLinks
A Serving System for Distributed and Parallel LLM Quantization [Efficient ML System]
☆23Updated last month
Alternatives and similar repositories for LLMEasyQuant
Users that are interested in LLMEasyQuant are comparing it to the libraries listed below
Sorting:
- GraphSnapShot: Caching Local Structure for Fast Graph Learning [Efficient ML System]☆34Updated last month
- Adaptive Topology Reconstruction for Robust Graph Representation Learning [Efficient ML Model]☆10Updated 6 months ago
- Accelerating Multitask Training Trough Adaptive Transition [Efficient ML Model]☆12Updated 2 months ago
- [TKDE'25] The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".☆403Updated 3 weeks ago
- [arXiv 2025] Efficient Reasoning Models: A Survey☆248Updated this week
- Efficient Foundation Model Design: A Perspective From Model and System Co-Design [Efficient ML System & Model]☆25Updated 5 months ago
- TransMLA: Multi-Head Latent Attention Is All You Need☆337Updated 3 weeks ago
- PyTorch code for our paper "ARB-LLM: Alternating Refined Binarizations for Large Language Models"☆26Updated last week
- [ICML 2024] PyTorch implementation for "Diversified Batch Selection for Training Acceleration"☆10Updated last year
- Survey Paper List - Efficient LLM and Foundation Models☆253Updated 10 months ago
- Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficien…☆114Updated 3 weeks ago
- [NeurIPS'24 Oral] HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning☆219Updated 8 months ago
- Official implementation of the ICLR paper "Streamlining Redundant Layers to Compress Large Language Models"☆31Updated 3 months ago
- [ICML 2024] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆109Updated last month
- Pytorch Implementation of "Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models", AAAI 2…☆37Updated 3 months ago
- ☆11Updated 8 months ago
- Official implementation of MASS: Multi-Agent Simulation Scaling for Portfolio Construction☆144Updated 2 months ago
- [ICCV'25] The official code implementation of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Langua…☆51Updated this week
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆124Updated last month
- Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper☆712Updated 2 months ago
- ☆147Updated 11 months ago
- A collection of AWESOME things about mixture-of-experts☆1,181Updated 8 months ago
- Code for the paper "VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use"☆106Updated this week
- Paper list for Efficient Reasoning.☆586Updated this week
- ☆26Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Updated last year
- Paper List of Inference/Test Time Scaling/Computing☆289Updated last month
- ☆23Updated 8 months ago
- Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"☆64Updated 3 months ago
- [SIGIR'24] The official implementation code of MOELoRA.☆32Updated last year