[R]einforcement [L]earning from [M]odel-rewarded [T]hinking - code for the paper "Language Models That Think, Chat Better"
☆124Oct 27, 2025Updated 4 months ago
Alternatives and similar repositories for RLMT
Users that are interested in RLMT are comparing it to the libraries listed below
Sorting:
- A book about Ph.D. student and research career planning☆28Oct 21, 2025Updated 4 months ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆22Jun 26, 2023Updated 2 years ago
- QRHead: Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking☆36Jan 20, 2026Updated last month
- ☆20Mar 26, 2025Updated 11 months ago
- The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"☆37Oct 1, 2025Updated 5 months ago
- The code used to train and run inference with MMDocIR☆32May 29, 2025Updated 9 months ago
- ☆28Oct 22, 2025Updated 4 months ago
- ☆19Nov 12, 2024Updated last year
- ☆22Oct 22, 2024Updated last year
- [ICLR 2025 Oral] Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition☆19Nov 25, 2024Updated last year
- ☆19Oct 2, 2023Updated 2 years ago
- ☆28Oct 2, 2025Updated 5 months ago
- Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence☆59Nov 11, 2025Updated 3 months ago
- Respect to the input tensor instead of paramters of NN☆21Jul 18, 2022Updated 3 years ago
- Simple and scalable tools for data-driven pretraining data selection.☆29Jun 9, 2025Updated 9 months ago
- Code for Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation (EVOL-RL).☆48Oct 16, 2025Updated 4 months ago
- Code for the 2025 ACL publication "Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs"☆32Jun 25, 2025Updated 8 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆223Nov 27, 2025Updated 3 months ago
- SuperCLUE-Math6:新一代中文原生多轮多步数学推 理数据集的探索之旅☆58Feb 5, 2024Updated 2 years ago
- The repository contains code for Adaptive Data Optimization☆32Dec 9, 2024Updated last year
- Long Context Extension and Generalization in LLMs☆63Sep 21, 2024Updated last year
- ☆29Feb 10, 2025Updated last year
- RewardAnything: Generalizable Principle-Following Reward Models☆45Jun 11, 2025Updated 8 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆66Dec 10, 2024Updated last year
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆145Feb 19, 2025Updated last year
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆65Jan 11, 2025Updated last year
- ACL 2022: Just Rank: Rethinking Evaluation with Word and Sentence Similarities☆35Dec 14, 2022Updated 3 years ago
- Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI, derived from Ling.☆107Aug 5, 2025Updated 7 months ago
- The demo, code and data of FollowRAG☆75Jun 30, 2025Updated 8 months ago
- ☆352Jul 29, 2025Updated 7 months ago
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆37May 31, 2025Updated 9 months ago
- This work has been accepted to Findings of EMNLP 2025!☆47Sep 5, 2025Updated 6 months ago
- A collection of research papers on low-precision training methods☆64May 10, 2025Updated 10 months ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Jun 10, 2024Updated last year
- JudgeLRM: Large Reasoning Models as a Judge☆41Jan 29, 2026Updated last month
- Official Implementation for [ICLR26] DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inference☆22Feb 9, 2026Updated last month
- Repository of IPBench☆19Jan 4, 2026Updated 2 months ago
- [ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆78Nov 25, 2024Updated last year
- homework in SCUT_SE☆12Nov 9, 2021Updated 4 years ago