[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
β71Jul 5, 2025Updated 8 months ago
Alternatives and similar repositories for sparselora
Users that are interested in sparselora are comparing it to the libraries listed below
Sorting:
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantizationβ38Sep 24, 2024Updated last year
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. π The official implementation of https://arxβ¦β29Feb 17, 2025Updated last year
- Vortex: A Flexible and Efficient Sparse Attention Frameworkβ48Jan 21, 2026Updated last month
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Foldersβ25Feb 21, 2025Updated last year
- [ICLR 2026 Oral] Locality-aware Parallel Decoding for Efficient Autoregressive Image Generationβ91Feb 7, 2026Updated last month
- [ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruningβ13Sep 2, 2024Updated last year
- A selective knowledge distillation algorithm for efficient speculative decodersβ36Nov 27, 2025Updated 3 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ374Jul 10, 2025Updated 7 months ago
- Fast, memory-efficient attention column reduction (e.g., sum, mean, max)β37Feb 10, 2026Updated 3 weeks ago
- a lightweight C++ LLaMA inference engine for mobile devicesβ15Oct 28, 2023Updated 2 years ago
- Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encodersβ18May 23, 2025Updated 9 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)β35Mar 7, 2025Updated last year
- Real-Time VLAs via Future-state-aware Asynchronous Inference.β328Feb 28, 2026Updated last week
- [ICML 2022] "DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks", by Yonggan β¦β35Jul 12, 2022Updated 3 years ago
- β112Feb 17, 2026Updated 2 weeks ago
- β20May 7, 2025Updated 10 months ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)β53Dec 17, 2024Updated last year
- Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learningβ28Jul 14, 2025Updated 7 months ago
- The official repo for LIFT: Language-Image Alignment with Fixed Text Encodersβ42Jun 10, 2025Updated 8 months ago
- Quantized Attention on GPUβ44Nov 22, 2024Updated last year
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Headsβ527Feb 10, 2025Updated last year
- A sparse attention kernel supporting mix sparse patternsβ472Jan 18, 2026Updated last month
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decodingβ143Dec 4, 2024Updated last year
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Seβ¦β817Mar 6, 2025Updated last year
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generationβ151Mar 21, 2025Updated 11 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ269Jul 6, 2025Updated 8 months ago
- BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.β58Feb 7, 2023Updated 3 years ago
- β34Oct 9, 2025Updated 4 months ago
- β21Dec 27, 2019Updated 6 years ago
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafterβ149Feb 27, 2026Updated last week
- β227Nov 19, 2025Updated 3 months ago
- [ICLR 2026] SparseD: Sparse Attention for Diffusion Language Modelsβ59Feb 22, 2026Updated last week
- torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.β24Mar 29, 2024Updated last year
- [HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.β81Dec 18, 2025Updated 2 months ago
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]β65Oct 2, 2025Updated 5 months ago
- β34Aug 18, 2025Updated 6 months ago
- Quantize transformers to any learned arbitrary 4-bit numeric formatβ51Jan 25, 2026Updated last month
- Compression for Foundation Modelsβ35Jul 21, 2025Updated 7 months ago
- Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?β118Oct 21, 2024Updated last year