[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
☆76Mar 10, 2026Updated 3 months ago
Alternatives and similar repositories for sparselora
Users that are interested in sparselora are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Feb 17, 2025Updated last year
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆40Sep 24, 2024Updated last year
- Vortex: Programmable Sparse Attention for Agents as Algorithm Designers☆61Jun 8, 2026Updated last week
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆27Feb 21, 2025Updated last year
- [ICLR 2026 Oral] Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation☆104May 8, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A selective knowledge distillation algorithm for efficient speculative decoders☆40Nov 27, 2025Updated 6 months ago
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆18Nov 4, 2025Updated 7 months ago
- The official repo for LIFT: Language-Image Alignment with Fixed Text Encoders☆42Jun 10, 2025Updated last year
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆393Jul 10, 2025Updated 11 months ago
- ☆137Feb 17, 2026Updated 4 months ago
- Real-Time VLAs via Future-state-aware Asynchronous Inference.☆415Apr 22, 2026Updated last month
- [ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning☆13Sep 2, 2024Updated last year
- A sparse attention kernel supporting mix sparse patterns☆525Jan 18, 2026Updated 5 months ago
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆53Dec 17, 2024Updated last year
- (NeurIPS 2025 D&B Track) OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps☆27May 4, 2026Updated last month
- ☆249Nov 19, 2025Updated 7 months ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆540Feb 10, 2025Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆152Dec 4, 2024Updated last year
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter☆173Feb 27, 2026Updated 3 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring☆281Jul 6, 2025Updated 11 months ago
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation☆162Mar 21, 2025Updated last year
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆844Mar 6, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆14Jul 17, 2024Updated last year
- ICLR 2026☆44May 29, 2026Updated 3 weeks ago
- [ICML 2022] "DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks", by Yonggan …☆35Jul 12, 2022Updated 3 years ago
- The official code for Dropping Backward Propagation (DropBP)☆32Oct 29, 2024Updated last year
- Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops☆30Mar 16, 2024Updated 2 years ago
- Whisper finetuning☆17Apr 9, 2025Updated last year
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆341Jul 2, 2024Updated last year
- Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs☆27Jun 25, 2024Updated last year
- 社会新闻检索系统,信息检索导论☆13Nov 19, 2019Updated 6 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention☆681Mar 6, 2026Updated 3 months ago
- [NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…☆1,221Apr 8, 2026Updated 2 months ago
- ☆83Oct 18, 2025Updated 8 months ago
- ☆38Jul 19, 2025Updated 11 months ago
- NeurIPS 2024: RAGraph: A General Retrieval-Augmented Graph Learning Framework☆23Feb 4, 2025Updated last year
- Code from PLDI '21 paper "Provable Repair of Deep Neural Networks."☆10Nov 26, 2022Updated 3 years ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆143Dec 17, 2025Updated 6 months ago