[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
β71Mar 10, 2026Updated 2 weeks ago
Alternatives and similar repositories for sparselora
Users that are interested in sparselora are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. π The official implementation of https://arxβ¦β29Feb 17, 2025Updated last year
- Vortex: A Flexible and Efficient Sparse Attention Frameworkβ49Jan 21, 2026Updated 2 months ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantizationβ38Sep 24, 2024Updated last year
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Foldersβ26Feb 21, 2025Updated last year
- [ICLR 2026 Oral] Locality-aware Parallel Decoding for Efficient Autoregressive Image Generationβ94Mar 12, 2026Updated 2 weeks ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Modelsβ18Nov 4, 2025Updated 4 months ago
- Fast, memory-efficient attention column reduction (e.g., sum, mean, max)β42Feb 10, 2026Updated last month
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ377Jul 10, 2025Updated 8 months ago
- The official repo for LIFT: Language-Image Alignment with Fixed Text Encodersβ42Jun 10, 2025Updated 9 months ago
- β122Feb 17, 2026Updated last month
- Real-Time VLAs via Future-state-aware Asynchronous Inference.β352Mar 6, 2026Updated 3 weeks ago
- A sparse attention kernel supporting mix sparse patternsβ485Jan 18, 2026Updated 2 months ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)β52Dec 17, 2024Updated last year
- Quantized Attention on GPUβ44Nov 22, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- β234Nov 19, 2025Updated 4 months ago
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafterβ157Feb 27, 2026Updated last month
- (NeurIPS 2025 D&B Track) OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlapsβ25Jan 22, 2026Updated 2 months ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Headsβ532Feb 10, 2025Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decodingβ145Dec 4, 2024Updated last year
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ274Jul 6, 2025Updated 8 months ago
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generationβ154Mar 21, 2025Updated last year
- β14Jul 17, 2024Updated last year
- [ICML 2022] "DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks", by Yonggan β¦β35Jul 12, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- The official code for Dropping Backward Propagation (DropBP)β32Oct 29, 2024Updated last year
- Whisper finetuningβ16Apr 9, 2025Updated 11 months ago
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Servingβ336Jul 2, 2024Updated last year
- [ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attentionβ648Mar 6, 2026Updated 3 weeks ago
- [NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attentionβ¦β1,198Mar 9, 2026Updated 2 weeks ago
- Recent Advances on Efficient Vision Transformersβ55Jan 11, 2023Updated 3 years ago
- β82Oct 18, 2025Updated 5 months ago
- β37Jul 19, 2025Updated 8 months ago
- NeurIPS 2024: RAGraph: A General Retrieval-Augmented Graph Learning Frameworkβ21Feb 4, 2025Updated last year
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Code from PLDI '21 paper "Provable Repair of Deep Neural Networks."β10Nov 26, 2022Updated 3 years ago
- [ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.β969Feb 25, 2026Updated last month
- Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encodersβ18May 23, 2025Updated 10 months ago
- [HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.β82Dec 18, 2025Updated 3 months ago
- A library for syntactically rewriting Python programs, pronounced (sinner).β66Feb 22, 2022Updated 4 years ago
- code for "EMS: 3D Eyebrow Modeling from Single-view Images"(SIGGRAPH Asia 2023)β13May 3, 2025Updated 10 months ago
- Model Compression Toolbox for Large Language Models and Diffusion Modelsβ764Aug 14, 2025Updated 7 months ago