xiaomi-research / colarLinks
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
☆22Updated last week
Alternatives and similar repositories for colar
Users that are interested in colar are comparing it to the libraries listed below
Sorting:
- ☆46Updated 7 months ago
- [ACL 2024] Making Long-Context Language Models Better Multi-Hop Reasoners☆16Updated last year
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆108Updated 3 weeks ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆22Updated 4 months ago
- ☆53Updated last week
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆60Updated 6 months ago
- A Survey on the Honesty of Large Language Models☆57Updated 6 months ago
- ☆74Updated last year
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆75Updated 7 months ago
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…☆28Updated 3 weeks ago
- Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"☆59Updated 2 weeks ago
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆64Updated last month
- my commonly-used tools☆56Updated 5 months ago
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆30Updated 11 months ago
- The official implementation of SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning☆15Updated last month
- 🔍 Awesome Agentic Search is a curated list of papers, tools, and resources on agentic search—where AI agents plan, search, and reason to…☆31Updated last week
- ☆59Updated 9 months ago
- ☆19Updated last month
- [ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.☆73Updated 4 months ago
- ☆46Updated 2 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆69Updated 3 weeks ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆75Updated 3 weeks ago
- ☆16Updated 7 months ago
- Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue (ACL 2024)☆23Updated 10 months ago
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆38Updated last year
- A comprehensive collection of process reward models.☆92Updated 2 weeks ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆51Updated 7 months ago
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆23Updated last year
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 6 months ago
- MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)☆40Updated last year