thu-nics / R2RLinks

The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing"

☆43

Alternatives and similar repositories for R2R

Users that are interested in R2R are comparing it to the libraries listed below

Sorting:

horseee / dKV-Cache
☆89Updated 2 months ago
pixeli99 / MixLN
[ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…
☆25Updated 2 weeks ago
Infini-AI-Lab / Multiverse
☆81Updated last week
z-lab / sparselora
[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
☆49Updated last month
ThisisBillhe / ZipAR
[ICML 2025] This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality…
☆51Updated 4 months ago
thu-nics / MBQ
The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"
☆50Updated 4 months ago
yu-rp / Dimple
Dimple, the first Discrete Diffusion Multimodal Large Language Model
☆85Updated last month
OpenSparseLLMs / Linearization
☆54Updated last month
czg1225 / VeriThinker
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
☆49Updated 3 weeks ago
thu-ml / ReMoE
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆85Updated 7 months ago
Aaronhuang-778 / Mixture-Compressor-MoE
[ICLR 2025] Mixture Compressor for Mixture-of-Experts LLMs Gains More
☆48Updated 5 months ago
AkideLiu / MiniCache
☆10Updated 11 months ago
zhixuan-lin / forgetting-transformer
[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"
☆118Updated last month
maomaocun / dLLM-cache
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…
☆132Updated this week
htqin / IR-QLoRA
[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…
☆67Updated last year
TianjinYellow / SPAM-Optimizer
☆34Updated 4 months ago
StargazerX0 / ScaleKV
ScaleKV: Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression
☆46Updated 2 months ago
mit-han-lab / lpd
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
☆65Updated 3 weeks ago
sustcsonglin / linear-attention-and-beyond-slides
☆79Updated 5 months ago
OpenSparseLLMs / Skip-DiT
✈️ [ICCV 2025] Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints
☆72Updated last month
nbasyl / DoRA
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
☆124Updated last year
MarkXCloud / CSpD
The official repo of continuous speculative decoding
☆27Updated 4 months ago
pprp / Pruner-Zero
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
☆91Updated 8 months ago
OpenSparseLLMs / Open-Pandora
Open-Pandora: On-the-fly Control Video Generation
☆34Updated 8 months ago
Lucky-Lance / Expert_Sparsity
[ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
☆95Updated last year
yunfeixie233 / ViGaL
☆50Updated last month
Lucky-Lance / SPP
[ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
☆21Updated last year
chenllliang / DnD-Transformer
[ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…
☆76Updated 8 months ago
horseee / learning-to-cache
[NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
☆110Updated last year
OpenSparseLLMs / MoM
☆95Updated 3 months ago