Mr-Tieguigui / LLM-Post-TrainingLinks
☆64Updated 4 months ago
Alternatives and similar repositories for LLM-Post-Training
Users that are interested in LLM-Post-Training are comparing it to the libraries listed below
Sorting:
- ☆105Updated 4 months ago
- ☆147Updated last week
- ☆37Updated 2 months ago
- ☆50Updated 7 months ago
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.☆158Updated 3 weeks ago
- Scaling Preference Data Curation via Human-AI Synergy☆113Updated 3 months ago
- ☆156Updated last week
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆140Updated 3 months ago
- ☆50Updated 2 months ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆37Updated 2 months ago
- [EMNLP'2025 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆64Updated 6 months ago
- Code for paper "Patch-Level Training for Large Language Models"☆88Updated 11 months ago
- ☆99Updated last week
- ☆53Updated 8 months ago
- "what, how, where, and how well? a survey on test-time scaling in large language models" repository☆71Updated last week
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆38Updated last year
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆143Updated 6 months ago
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆106Updated 6 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆130Updated 7 months ago
- Towards a Unified View of Large Language Model Post-Training☆163Updated last month
- This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language …☆119Updated 5 months ago
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆96Updated 9 months ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆40Updated 7 months ago
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models (NeurIPS 2025)☆154Updated this week
- Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models☆117Updated 4 months ago
- ☆129Updated 7 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆33Updated last year
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆56Updated 4 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆62Updated 10 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆151Updated 9 months ago