[R]einforcement [L]earning from [M]odel-rewarded [T]hinking - code for the paper "Language Models That Think, Chat Better"
☆129Oct 27, 2025Updated 6 months ago
Alternatives and similar repositories for RLMT
Users that are interested in RLMT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆24Mar 26, 2025Updated last year
- ☆36Oct 22, 2025Updated 6 months ago
- ☆22Oct 22, 2024Updated last year
- code for paper "Accessing higher dimensions for unsupervised word translation"☆22Jun 26, 2023Updated 2 years ago
- Your efficient and accurate answer verification system for RL training.☆41Jun 23, 2025Updated 10 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆68Dec 10, 2024Updated last year
- ☆29Oct 2, 2025Updated 7 months ago
- [ICLR 2026] Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding☆33Jan 27, 2026Updated 3 months ago
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆32Jun 5, 2025Updated 11 months ago
- JudgeLRM: Large Reasoning Models as a Judge☆42May 6, 2026Updated 2 weeks ago
- ☆31Feb 10, 2025Updated last year
- ☆19Nov 12, 2024Updated last year
- SuperCLUE-Math6:新一代中文原生多轮多步数学推理数据集的探索之旅☆58Feb 5, 2024Updated 2 years ago
- ☆12Oct 30, 2025Updated 6 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A simple implementation of ReasonGenRM.☆19Apr 21, 2025Updated last year
- Official Code For EMNLP2025 Findings: {DLPO : Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Le…☆10Dec 25, 2025Updated 4 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆150Feb 19, 2025Updated last year
- This repo would give multi-task keypoint detect code based yolov8. The landmarks or keypoints with different classes and numbers can be …☆12Feb 28, 2023Updated 3 years ago
- The repository contains code for Adaptive Data Optimization☆36Dec 9, 2024Updated last year
- 斗破苍穹小说的新词发现☆13May 12, 2022Updated 4 years ago
- ☆358Jul 29, 2025Updated 9 months ago
- Simple and scalable tools for data-driven pretraining data selection.☆29Jun 9, 2025Updated 11 months ago
- Resources for the Enigmata Project.☆82Aug 13, 2025Updated 9 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 🏆 The 1st Place Solution for AICity2022 Challenge Track2: Natural Language-Based Vehicle Retrieval.☆12Jul 25, 2022Updated 3 years ago
- ☆19Oct 2, 2023Updated 2 years ago
- [ICLR 2025 Oral] Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition☆17Nov 25, 2024Updated last year
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆225Nov 27, 2025Updated 5 months ago
- ☆13Aug 15, 2025Updated 9 months ago
- Bayesian scaling laws for in-context learning.☆15Mar 12, 2025Updated last year
- The code used to train and run inference with MMDocIR☆33May 29, 2025Updated 11 months ago
- 中山大学2024年计算机图形学大作业——基于OpenGL的3D烟花粒⼦实时渲染系统☆18Nov 28, 2025Updated 5 months ago
- Evergreen, contamination-free, real-world, domain-specific AI evaluation framework☆136Jan 11, 2026Updated 4 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆56Jun 13, 2025Updated 11 months ago
- ☆11May 29, 2024Updated last year
- [COLM'25] A Controlled Study on Long Context Extension and Generalization in LLMs☆65Mar 9, 2026Updated 2 months ago
- ☆13Feb 17, 2025Updated last year
- Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence☆64Nov 11, 2025Updated 6 months ago
- Code for paper "ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models"☆17Mar 29, 2024Updated 2 years ago
- TheWebConf'24 full paper - "Linear-Time Graph Neural Networks for Scalable Recommendations"☆22Jul 23, 2025Updated 9 months ago