thu-nics / R2RLinks
[NeurIPS'25] The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing"
☆59Updated 3 weeks ago
Alternatives and similar repositories for R2R
Users that are interested in R2R are comparing it to the libraries listed below
Sorting:
- [NeurIPS'25] dKV-Cache: The Cache for Diffusion Language Models☆121Updated 6 months ago
- [ICLR 2025] Mixture Compressor for Mixture-of-Experts LLMs Gains More☆62Updated 9 months ago
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity☆61Updated 4 months ago
- The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"☆66Updated 8 months ago
- Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation☆80Updated 4 months ago
- ☆103Updated 2 months ago
- The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink…☆108Updated 2 months ago
- [ICML 2025] This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality…☆53Updated 8 months ago
- ☆61Updated 4 months ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆67Updated last year
- [CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆151Updated this week
- [ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆96Updated last year
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆99Updated 11 months ago
- dParallel: Learnable Parallel Decoding for dLLMs☆42Updated last month
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring☆255Updated 4 months ago
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆116Updated last year
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆126Updated 5 months ago
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…☆185Updated last week
- SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention☆142Updated 2 weeks ago
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆24Updated 9 months ago
- [NeurIPS 2025] VeriThinker: Learning to Verify Makes Reasoning Model Efficient☆62Updated 2 months ago
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models☆45Updated 4 months ago
- ☆96Updated 9 months ago
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆112Updated 4 months ago
- [ICML 2025 Oral] Mixture of Lookup Experts☆55Updated 6 months ago
- This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs☆41Updated last year
- ☆10Updated last year
- [CVPR 2025] Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers☆73Updated last year
- ☆84Updated this week
- VideoNSA: Native Sparse Attention Scales Video Understanding☆61Updated 2 weeks ago