bytedance / FTRLLinks
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments
☆47Updated this week
Alternatives and similar repositories for FTRL
Users that are interested in FTRL are comparing it to the libraries listed below
Sorting:
- Scaling Preference Data Curation via Human-AI Synergy☆135Updated 6 months ago
- ☆87Updated 4 months ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆40Updated 5 months ago
- ☆99Updated 5 months ago
- The code for paper: Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search☆63Updated 6 months ago
- ☆93Updated 7 months ago
- WideSearch: Benchmarking Agentic Broad Info-Seeking☆110Updated 3 months ago
- ☆32Updated 5 months ago
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models☆57Updated 5 months ago
- ☆195Updated 2 weeks ago
- IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent☆67Updated 7 months ago
- The raw UserRL repo under construction☆86Updated 3 months ago
- 超简单复现Deepseek-R1-Zero和Deepseek-R1,以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL,以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of Dee…☆33Updated 9 months ago
- Implementation for OAgents: An Empirical Study of Building Effective Agents☆299Updated 2 months ago
- FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient LLM Reasoning☆55Updated 3 months ago
- [NeurIPS 2025] The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆187Updated 6 months ago
- ☆46Updated 7 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆182Updated 5 months ago
- ☆117Updated 7 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆32Updated last year
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆94Updated 2 months ago
- DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL☆224Updated 3 months ago
- MPO: Boosting LLM Agents with Meta Plan Optimization (EMNLP 2025 Findings)☆71Updated 4 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆142Updated last month
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆119Updated 7 months ago
- ☆75Updated 6 months ago
- rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking☆39Updated 11 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆222Updated 5 months ago
- The demo, code and data of FollowRAG☆75Updated 6 months ago
- MrlX: A Multi-Agent Reinforcement Learning Framework☆160Updated last month