☆14Mar 20, 2025Updated 11 months ago
Alternatives and similar repositories for Value-Residual-Learning
Users that are interested in Value-Residual-Learning are comparing it to the libraries listed below
Sorting:
- ☆19Jun 4, 2025Updated 8 months ago
- ☆21Jul 21, 2025Updated 7 months ago
- UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation☆22May 16, 2025Updated 9 months ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated 11 months ago
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆17Oct 17, 2025Updated 4 months ago
- ☆21May 3, 2025Updated 9 months ago
- Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning☆28Jul 14, 2025Updated 7 months ago
- ☆59May 13, 2025Updated 9 months ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- ☆17Aug 1, 2025Updated 6 months ago
- Code for paper "Analog Foundation Models"☆30Sep 18, 2025Updated 5 months ago
- ☆45May 27, 2025Updated 9 months ago
- Resa: Transparent Reasoning Models via SAEs☆47Sep 23, 2025Updated 5 months ago
- ☆23Sep 19, 2024Updated last year
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 7 months ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Updated this week
- Official implementation of the paper "You Do Not Fully Utilize Transformer's Representation Capacity"☆32May 28, 2025Updated 8 months ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆35Oct 3, 2024Updated last year
- KV Cache Steering for Inducing Reasoning in Small Language Models☆46Jul 24, 2025Updated 7 months ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Dec 29, 2025Updated last month
- ☆18Jun 10, 2025Updated 8 months ago
- Official Implementation of APB (ACL 2025 main Oral) and Spava.☆33Jan 30, 2026Updated 3 weeks ago
- [NeurIPS'25] The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Tok…☆78Feb 10, 2026Updated 2 weeks ago
- ☆35May 16, 2025Updated 9 months ago
- instruction-following benchmark for large reasoning models☆44Aug 9, 2025Updated 6 months ago
- Large-scale semi-supervised framework with 1B+ labeled masks from 48K+ datasets with test-time adaptation to new domains (ICCV25).☆44Dec 28, 2025Updated last month
- [AAAI 2025] SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks☆44Jun 12, 2025Updated 8 months ago
- A Framework for Evaluating AI Agent Safety in Realistic Environments☆30Oct 2, 2025Updated 4 months ago
- ☆11Jun 22, 2025Updated 8 months ago
- Martingale posterior neural networks for fast sequential decision making @ Neurips 2025☆23Nov 13, 2025Updated 3 months ago
- Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…☆30Oct 30, 2025Updated 4 months ago
- The official implement of paper 《DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents》☆29Oct 23, 2025Updated 4 months ago
- [ICLR 2026] ParallelBench: Understanding the Tradeoffs of Parallel Decoding in Diffusion LLMs☆30Updated this week
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.☆47Jul 17, 2025Updated 7 months ago
- ☆47Apr 29, 2025Updated 9 months ago
- [CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention☆38Mar 11, 2025Updated 11 months ago
- This repository reproduces the results in the paper "How expressive are transformers in spectral domain for graphs?"(published in TMLR)☆12Jul 10, 2022Updated 3 years ago