Mr-Tieguigui / LLM-Post-TrainingLinks
☆54Updated 3 months ago
Alternatives and similar repositories for LLM-Post-Training
Users that are interested in LLM-Post-Training are comparing it to the libraries listed below
Sorting:
- ☆96Updated 3 months ago
- Scaling Preference Data Curation via Human-AI Synergy☆100Updated last month
- ☆122Updated this week
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.☆150Updated last month
- [Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.☆392Updated last week
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆138Updated 4 months ago
- ☆27Updated 2 weeks ago
- "what, how, where, and how well? a survey on test-time scaling in large language models" repository☆63Updated this week
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆35Updated last year
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆127Updated 2 months ago
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆296Updated last week
- ☆50Updated 5 months ago
- ☆74Updated 2 weeks ago
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆95Updated 8 months ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆41Updated 6 months ago
- The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning☆311Updated 3 months ago
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆55Updated 2 months ago
- ☆53Updated 6 months ago
- Parameter-Efficient Fine-Tuning for Foundation Models☆88Updated 5 months ago
- ☆34Updated 2 weeks ago
- ☆87Updated last week
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆149Updated 2 months ago
- MiroTrain is an efficient and algorithm-first framework for post-training large agentic models.☆77Updated this week
- ☆89Updated 3 months ago
- The official GitHub page for the survey paper "Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey". And this paper is unde…☆55Updated 3 weeks ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆62Updated 8 months ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆35Updated 3 weeks ago
- ☆147Updated 3 months ago
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆124Updated 4 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆190Updated 5 months ago