modelscope / RM-GalleryLinks
A One-Stop Reward Model Platform
☆45Updated this week
Alternatives and similar repositories for RM-Gallery
Users that are interested in RM-Gallery are comparing it to the libraries listed below
Sorting:
- a-m-team's exploration in large language modeling☆173Updated last month
- ☆274Updated last month
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆263Updated last year
- ☆47Updated 5 months ago
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆140Updated this week
- The related works and background techniques about Openai o1☆223Updated 6 months ago
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆379Updated 3 weeks ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆230Updated 4 months ago
- ☆142Updated last year
- A live reading list for LLM-synthetic-data.☆308Updated last week
- ☆181Updated last year
- ☆543Updated 6 months ago
- 大模型多维度中文对齐评测基准 (ACL 2024)☆398Updated 11 months ago
- ☆84Updated last year
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆137Updated 3 months ago
- ☆68Updated 5 months ago
- ☆17Updated 2 years ago
- ☆172Updated last year
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning☆662Updated last month
- ☆18Updated last week
- A collection of phenomenons observed during the scaling of big foundation models, which may be developed into consensus, principles, or l…☆281Updated last year
- ☆242Updated last week
- Awesome Agent Training☆188Updated last week
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆129Updated this week
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆121Updated 3 months ago
- Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆201Updated last week
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆372Updated 5 months ago
- A curated list of awesome works in Routing LLMs paradigm (👉 Welcome to submit your contributions to this code repository)☆47Updated this week
- LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment☆357Updated last year
- ☆149Updated 5 months ago