[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
β73Mar 10, 2026Updated last month
Alternatives and similar repositories for sparselora
Users that are interested in sparselora are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. π The official implementation of https://arxβ¦β28Feb 17, 2025Updated last year
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantizationβ39Sep 24, 2024Updated last year
- Vortex: A Flexible and Efficient Sparse Attention Frameworkβ51Updated this week
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Foldersβ26Feb 21, 2025Updated last year
- A selective knowledge distillation algorithm for efficient speculative decodersβ37Nov 27, 2025Updated 4 months ago
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Modelsβ18Nov 4, 2025Updated 5 months ago
- Fast, memory-efficient attention column reduction (e.g., sum, mean, max)β44Feb 10, 2026Updated 2 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ380Jul 10, 2025Updated 9 months ago
- β125Feb 17, 2026Updated last month
- Real-Time VLAs via Future-state-aware Asynchronous Inference.β361Mar 6, 2026Updated last month
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)β52Dec 17, 2024Updated last year
- A sparse attention kernel supporting mix sparse patternsβ497Jan 18, 2026Updated 2 months ago
- [ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruningβ13Sep 2, 2024Updated last year
- Quantized Attention on GPUβ44Nov 22, 2024Updated last year
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- β239Nov 19, 2025Updated 4 months ago
- β16Apr 8, 2026Updated last week
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Headsβ536Feb 10, 2025Updated last year
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ276Jul 6, 2025Updated 9 months ago
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generationβ156Mar 21, 2025Updated last year
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Seβ¦β826Mar 6, 2025Updated last year
- β14Jul 17, 2024Updated last year
- β39Aug 28, 2025Updated 7 months ago
- [ICML 2022] "DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks", by Yonggan β¦β35Jul 12, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- The official code for Dropping Backward Propagation (DropBP)β32Oct 29, 2024Updated last year
- Faster Pytorch bitsandbytes 4bit fp4 nn.Linear opsβ30Mar 16, 2024Updated 2 years ago
- Whisper finetuningβ16Apr 9, 2025Updated last year
- Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMsβ27Jun 25, 2024Updated last year
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Servingβ336Jul 2, 2024Updated last year
- [ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attentionβ655Mar 6, 2026Updated last month
- [NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attentionβ¦β1,203Apr 8, 2026Updated last week
- Recent Advances on Efficient Vision Transformersβ55Jan 11, 2023Updated 3 years ago
- β18Oct 22, 2024Updated last year
- Deploy open-source AI quickly and easily - Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- β82Oct 18, 2025Updated 5 months ago
- β38Jul 19, 2025Updated 8 months ago
- NeurIPS 2024: RAGraph: A General Retrieval-Augmented Graph Learning Frameworkβ21Feb 4, 2025Updated last year
- Code from PLDI '21 paper "Provable Repair of Deep Neural Networks."β10Nov 26, 2022Updated 3 years ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Modelsβ143Dec 17, 2025Updated 3 months ago
- Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encodersβ18May 23, 2025Updated 10 months ago
- [HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.β85Dec 18, 2025Updated 3 months ago