[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
β75Mar 10, 2026Updated 2 months ago
Alternatives and similar repositories for sparselora
Users that are interested in sparselora are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. π The official implementation of https://arxβ¦β29Feb 17, 2025Updated last year
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantizationβ40Sep 24, 2024Updated last year
- Vortex: A Flexible and Efficient Sparse Attention Frameworkβ53Updated this week
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Foldersβ27Feb 21, 2025Updated last year
- A selective knowledge distillation algorithm for efficient speculative decodersβ40Nov 27, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Modelsβ18Nov 4, 2025Updated 6 months ago
- Fast, memory-efficient attention column reduction (e.g., sum, mean, max)β46Feb 10, 2026Updated 3 months ago
- The official repo for LIFT: Language-Image Alignment with Fixed Text Encodersβ42Jun 10, 2025Updated 11 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ386Jul 10, 2025Updated 10 months ago
- β135Feb 17, 2026Updated 3 months ago
- Real-Time VLAs via Future-state-aware Asynchronous Inference.β393Apr 22, 2026Updated last month
- A sparse attention kernel supporting mix sparse patternsβ517Jan 18, 2026Updated 4 months ago
- Quantized Attention on GPUβ44Nov 22, 2024Updated last year
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)β53Dec 17, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- (NeurIPS 2025 D&B Track) OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlapsβ26May 4, 2026Updated 3 weeks ago
- β248Nov 19, 2025Updated 6 months ago
- β16Apr 8, 2026Updated last month
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Headsβ542Feb 10, 2025Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decodingβ149Dec 4, 2024Updated last year
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafterβ170Feb 27, 2026Updated 3 months ago
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learningβ181Nov 11, 2025Updated 6 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ277Jul 6, 2025Updated 10 months ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Seβ¦β840Mar 6, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ICML 2022] "DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks", by Yonggan β¦β35Jul 12, 2022Updated 3 years ago
- The official code for Dropping Backward Propagation (DropBP)β32Oct 29, 2024Updated last year
- Faster Pytorch bitsandbytes 4bit fp4 nn.Linear opsβ30Mar 16, 2024Updated 2 years ago
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Servingβ340Jul 2, 2024Updated last year
- Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMsβ27Jun 25, 2024Updated last year
- [ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attentionβ665Mar 6, 2026Updated 2 months ago
- Official implementation of "Poisoned Forgery Face: Towards Backdoor Attacks on Face Forgery Detection" (ICLR 2024)β18Apr 15, 2024Updated 2 years ago
- β83Oct 18, 2025Updated 7 months ago
- β38Jul 19, 2025Updated 10 months ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Code from PLDI '21 paper "Provable Repair of Deep Neural Networks."β10Nov 26, 2022Updated 3 years ago
- [ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.β995Feb 25, 2026Updated 3 months ago
- Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encodersβ19May 23, 2025Updated last year
- A library for syntactically rewriting Python programs, pronounced (sinner).β66Feb 22, 2022Updated 4 years ago
- β192Jan 14, 2025Updated last year
- Model Compression Toolbox for Large Language Models and Diffusion Modelsβ784Aug 14, 2025Updated 9 months ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ264Aug 9, 2025Updated 9 months ago