z-lab / sparseloraView external linksLinks
[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
β71Jul 5, 2025Updated 7 months ago
Alternatives and similar repositories for sparselora
Users that are interested in sparselora are comparing it to the libraries listed below
Sorting:
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantizationβ37Sep 24, 2024Updated last year
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. π The official implementation of https://arxβ¦β29Feb 17, 2025Updated 11 months ago
- Vortex: A Flexible and Efficient Sparse Attention Frameworkβ46Jan 21, 2026Updated 3 weeks ago
- [ICLR 2026 Oral] Locality-aware Parallel Decoding for Efficient Autoregressive Image Generationβ82Feb 7, 2026Updated last week
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Foldersβ25Feb 21, 2025Updated 11 months ago
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Modelsβ17Nov 4, 2025Updated 3 months ago
- A selective knowledge distillation algorithm for efficient speculative decodersβ36Nov 27, 2025Updated 2 months ago
- [ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruningβ13Sep 2, 2024Updated last year
- Fast, memory-efficient attention column reduction (e.g., sum, mean, max)β34Updated this week
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ372Jul 10, 2025Updated 7 months ago
- a lightweight C++ LLaMA inference engine for mobile devicesβ15Oct 28, 2023Updated 2 years ago
- Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encodersβ18May 23, 2025Updated 8 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)β35Mar 7, 2025Updated 11 months ago
- Real-Time VLAs via Future-state-aware Asynchronous Inference.β313Jan 30, 2026Updated 2 weeks ago
- β19May 7, 2025Updated 9 months ago
- NeurIPS 2024: RAGraph: A General Retrieval-Augmented Graph Learning Frameworkβ21Feb 4, 2025Updated last year
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)β53Dec 17, 2024Updated last year
- Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learningβ28Jul 14, 2025Updated 7 months ago
- The official repo for LIFT: Language-Image Alignment with Fixed Text Encodersβ42Jun 10, 2025Updated 8 months ago
- Quantized Attention on GPUβ44Nov 22, 2024Updated last year
- A sparse attention kernel supporting mix sparse patternsβ455Jan 18, 2026Updated 3 weeks ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Headsβ524Feb 10, 2025Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decodingβ142Dec 4, 2024Updated last year
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generationβ149Mar 21, 2025Updated 10 months ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Seβ¦β812Mar 6, 2025Updated 11 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ269Jul 6, 2025Updated 7 months ago
- BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.β58Feb 7, 2023Updated 3 years ago
- torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.β23Mar 29, 2024Updated last year
- β21Dec 27, 2019Updated 6 years ago
- β34Oct 9, 2025Updated 4 months ago
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]β61Oct 2, 2025Updated 4 months ago
- [ICLR 2026] SparseD: Sparse Attention for Diffusion Language Modelsβ57Oct 7, 2025Updated 4 months ago
- [HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.β80Dec 18, 2025Updated last month
- Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMsβ27Jun 25, 2024Updated last year
- β34Aug 18, 2025Updated 5 months ago
- Quantize transformers to any learned arbitrary 4-bit numeric formatβ51Jan 25, 2026Updated 3 weeks ago
- Compression for Foundation Modelsβ35Jul 21, 2025Updated 6 months ago
- Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?β119Oct 21, 2024Updated last year
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Servingβ336Jul 2, 2024Updated last year