[R]einforcement [L]earning from [M]odel-rewarded [T]hinking - code for the paper "Language Models That Think, Chat Better"
☆129Oct 27, 2025Updated 7 months ago
Alternatives and similar repositories for RLMT
Users that are interested in RLMT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A book about Ph.D. student and research career planning☆29Oct 21, 2025Updated 7 months ago
- ☆24Jun 1, 2026Updated last week
- ☆37May 29, 2026Updated last week
- ☆22Oct 22, 2024Updated last year
- code for paper "Accessing higher dimensions for unsupervised word translation"☆22Jun 26, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Your efficient and accurate answer verification system for RL training.☆42Jun 23, 2025Updated 11 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆66Dec 10, 2024Updated last year
- [ICLR 2026] Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding☆33Jan 27, 2026Updated 4 months ago
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆32Jun 5, 2025Updated last year
- Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages." Paper accepted at Findings of EMNLP 2024☆18Mar 25, 2025Updated last year
- 使用yolov8自动标注,运用度量学习metric learning 的ReID算法,实现跨镜头人脸追踪☆10May 15, 2024Updated 2 years ago
- JudgeLRM: Large Reasoning Models as a Judge☆42May 6, 2026Updated last month
- ☆19Nov 12, 2024Updated last year
- SuperCLUE-Math6:新一代中文原生多轮多步数学推理数据集的探索之旅☆58Feb 5, 2024Updated 2 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- A simple implementation of ReasonGenRM.☆19Apr 21, 2025Updated last year
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆152Feb 19, 2025Updated last year
- ☆10Dec 16, 2023Updated 2 years ago
- The repository contains code for Adaptive Data Optimization☆36Dec 9, 2024Updated last year
- 斗破苍穹小说的新词发现☆13May 12, 2022Updated 4 years ago
- QRHead: Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking☆39Jan 20, 2026Updated 4 months ago
- ☆359Jul 29, 2025Updated 10 months ago
- Simple and scalable tools for data-driven pretraining data selection.☆29Jun 9, 2025Updated last year
- Multi-step reasoning MLLM☆24Mar 8, 2026Updated 3 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Resources for the Enigmata Project.☆82Aug 13, 2025Updated 9 months ago
- ☆19Oct 2, 2023Updated 2 years ago
- [ICLR 2025 Oral] Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition☆17Nov 25, 2024Updated last year
- LLM-guided hyperparameter tuning☆10Oct 7, 2023Updated 2 years ago
- ☆14Aug 15, 2025Updated 9 months ago
- 哈工大威海自动评教脚本☆12Feb 4, 2024Updated 2 years ago
- 使用Qwen3的Embedding和Reranker模型实现查找与精排☆21Jun 22, 2025Updated 11 months ago
- Bayesian scaling laws for in-context learning.☆15Mar 12, 2025Updated last year
- The code used to train and run inference with MMDocIR☆33May 29, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 中山大学2024年计算机图形学大作业——基于OpenGL的3D烟花粒⼦实时渲染系统☆18Nov 28, 2025Updated 6 months ago
- Evergreen, contamination-free, real-world, domain-specific AI evaluation framework☆137Jan 11, 2026Updated 4 months ago
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆56Jun 13, 2025Updated 11 months ago
- yolov8在hisi3536a推理☆11Dec 15, 2023Updated 2 years ago
- ☆11May 29, 2024Updated 2 years ago
- [COLM'25] A Controlled Study on Long Context Extension and Generalization in LLMs☆65Mar 9, 2026Updated 3 months ago
- 第十九届“挑战杯”揭榜挂帅专项赛华为赛道打榜第一&国家特等奖-拔萝卜的工程队作品仓库 19th Challenge Cup National Grand Prize☆34Mar 18, 2026Updated 2 months ago