HarderThenHarder / RLLoggingBoard
A visuailzation tool to make deep understaning and easier debugging for RLHF training.
☆188Updated 2 months ago
Alternatives and similar repositories for RLLoggingBoard:
Users that are interested in RLLoggingBoard are comparing it to the libraries listed below
- A flexible and efficient training framework for large-scale alignment tasks☆343Updated 2 months ago
- ☆132Updated 3 months ago
- ☆115Updated last week
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆234Updated 3 weeks ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆312Updated 3 weeks ago
- The related works and background techniques about Openai o1☆221Updated 3 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆175Updated last month
- ☆524Updated 4 months ago
- Real-time updated, fine-grained reading list on LLM-synthetic-data.🔥☆255Updated 3 months ago
- ☆168Updated last month
- ☆169Updated last year
- ☆183Updated 3 weeks ago
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning☆441Updated last week
- Collect every awesome work about r1!☆356Updated this week
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆290Updated last week
- ☆149Updated this week
- ☆318Updated 9 months ago
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆71Updated this week
- 大模型多维度中文对齐评测基准 (ACL 2024)☆382Updated 8 months ago
- Ling is a MoE LLM provided and open-sourced by InclusionAI.☆145Updated 2 weeks ago
- A Comprehensive Survey on Long Context Language Modeling☆138Updated last month
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆181Updated last year
- ☆192Updated 2 months ago
- An automated pipeline for evaluating LLMs for role-playing.☆171Updated 7 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆135Updated 4 months ago
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆256Updated last year
- ☆143Updated 10 months ago
- A series of technical report on Slow Thinking with LLM☆657Updated 3 weeks ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆176Updated 3 weeks ago
- ☆276Updated 9 months ago