lqtrung1998 / mwp_ReFTLinks

☆544

Alternatives and similar repositories for mwp_ReFT

Users that are interested in mwp_ReFT are comparing it to the libraries listed below

Sorting:

RUCAIBox / Slow_Thinking_with_LLMs
A series of technical report on Slow Thinking with LLM
☆713Updated last month
THUDM / ReST-MCTS
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)
☆654Updated 6 months ago
0russwest0 / Agent-R1
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
☆697Updated last week
RUCAIBox / R1-Searcher
R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
☆603Updated 2 months ago
dvlab-research / Step-DPO
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
☆375Updated 6 months ago
wjn1996 / Awesome-LLM-Reasoning-Openai-o1-Survey
The related works and background techniques about Openai o1
☆223Updated 6 months ago
tianyi-lab / Cherry_LLM
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…
☆381Updated last month
qiancheng0 / ToolRL
☆293Updated last month
SimpleBerry / LLaMA-O1
Large Reasoning Models
☆804Updated 7 months ago
princeton-nlp / LESS
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
☆474Updated 9 months ago
pengr / LLM-Synthetic-Data
A live reading list for LLM-synthetic-data.
☆330Updated 2 weeks ago
Qihoo360 / Light-R1
☆734Updated 2 months ago
GAIR-NLP / DeepResearcher
Scaling Deep Research via Reinforcement Learning in Real-world Environments.
☆532Updated 3 months ago
MARIO-Math-Reasoning / Super_MARIO
☆337Updated last month
zhentingqi / rStar
☆953Updated 6 months ago
0russwest0 / Awesome-Agent-RL
☆293Updated 2 months ago
princeton-nlp / SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
☆912Updated 5 months ago
huggingface / Math-Verify
☆857Updated last month
YuxiXie / MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆319Updated 11 months ago
ADaM-BJTU / OpenRFT
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
☆146Updated 7 months ago
hkust-nlp / deita
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
☆561Updated 7 months ago
TsinghuaC3I / Awesome-RL-Reasoning-Recipes
Awesome RL Reasoning Recipes ("Triple R")
☆762Updated last month
langfengQ / verl-agent
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…
☆646Updated last week
HarderThenHarder / RLLoggingBoard
A visuailzation tool to make deep understaning and easier debugging for RLHF training.
☆238Updated 5 months ago
a-m-team / a-m-models
a-m-team's exploration in large language modeling
☆178Updated 2 months ago
Eclipsess / Awesome-Efficient-Reasoning-LLMs
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
☆538Updated last month
quchangle1 / LLM-Tool-Survey
This is the repository for the Tool Learning survey.
☆416Updated 2 months ago
QwenLM / AutoIF
☆298Updated last year
openreasoner / openr
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
☆1,803Updated 6 months ago
lsdefine / simple_GRPO
A very simple GRPO implement for reproducing r1-like LLM thinking.
☆1,219Updated last week