[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
β75Mar 10, 2026Updated last month
Alternatives and similar repositories for sparselora
Users that are interested in sparselora are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. π The official implementation of https://arxβ¦β29Feb 17, 2025Updated last year
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantizationβ39Sep 24, 2024Updated last year
- Vortex: A Flexible and Efficient Sparse Attention Frameworkβ53Apr 30, 2026Updated last week
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Foldersβ27Feb 21, 2025Updated last year
- [ICLR 2026 Oral] Locality-aware Parallel Decoding for Efficient Autoregressive Image Generationβ100Mar 12, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Modelsβ18Nov 4, 2025Updated 6 months ago
- Fast, memory-efficient attention column reduction (e.g., sum, mean, max)β46Feb 10, 2026Updated 3 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ384Jul 10, 2025Updated 9 months ago
- The official repo for LIFT: Language-Image Alignment with Fixed Text Encodersβ42Jun 10, 2025Updated 11 months ago
- β130Feb 17, 2026Updated 2 months ago
- Real-Time VLAs via Future-state-aware Asynchronous Inference.β387Apr 22, 2026Updated 2 weeks ago
- [ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruningβ13Sep 2, 2024Updated last year
- A sparse attention kernel supporting mix sparse patternsβ509Jan 18, 2026Updated 3 months ago
- Quantized Attention on GPUβ44Nov 22, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)β53Dec 17, 2024Updated last year
- (NeurIPS 2025 D&B Track) OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlapsβ26Updated this week
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafterβ163Feb 27, 2026Updated 2 months ago
- β248Nov 19, 2025Updated 5 months ago
- β16Apr 8, 2026Updated last month
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decodingβ146Dec 4, 2024Updated last year
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Headsβ543Feb 10, 2025Updated last year
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ277Jul 6, 2025Updated 10 months ago
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generationβ156Mar 21, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- β14Jul 17, 2024Updated last year
- [ICML 2022] "DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks", by Yonggan β¦β35Jul 12, 2022Updated 3 years ago
- The official code for Dropping Backward Propagation (DropBP)β32Oct 29, 2024Updated last year
- Faster Pytorch bitsandbytes 4bit fp4 nn.Linear opsβ30Mar 16, 2024Updated 2 years ago
- [ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attentionβ664Mar 6, 2026Updated 2 months ago
- [NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attentionβ¦β1,211Apr 8, 2026Updated last month
- β83Oct 18, 2025Updated 6 months ago
- β38Jul 19, 2025Updated 9 months ago
- Code from PLDI '21 paper "Provable Repair of Deep Neural Networks."β10Nov 26, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Modelsβ143Dec 17, 2025Updated 4 months ago
- Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encodersβ18May 23, 2025Updated 11 months ago
- [HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.β85Dec 18, 2025Updated 4 months ago
- A library for syntactically rewriting Python programs, pronounced (sinner).β66Feb 22, 2022Updated 4 years ago
- β192Jan 14, 2025Updated last year
- Model Compression Toolbox for Large Language Models and Diffusion Modelsβ779Aug 14, 2025Updated 8 months ago
- helper functions for processing and integrating visual language information with Qwen-VL Series Modelβ18Aug 30, 2024Updated last year