liushunyu / awesome-direct-preference-optimizationLinks
A Survey of Direct Preference Optimization (DPO)
☆81Updated 3 months ago
Alternatives and similar repositories for awesome-direct-preference-optimization
Users that are interested in awesome-direct-preference-optimization are comparing it to the libraries listed below
Sorting:
- ☆123Updated 4 months ago
- AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)☆338Updated 3 months ago
- ☆275Updated 3 months ago
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆138Updated 3 months ago
- VeriGUI: Verifiable Long-Chain GUI Dataset☆81Updated 2 months ago
- Paper List of Inference/Test Time Scaling/Computing☆313Updated last month
- A comprehensive collection of process reward models.☆111Updated last week
- (ICLR'25) A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents☆86Updated 8 months ago
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆182Updated last month
- [NeurIPS 2025] Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains☆57Updated 2 months ago
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆96Updated 9 months ago
- 😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, Agent, and Beyond☆306Updated 2 weeks ago
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆190Updated 5 months ago
- Survey on Data-centric Large Language Models☆84Updated last year
- ☆171Updated 5 months ago
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…☆54Updated 4 months ago
- A curated list of Awesome-LLM-Ensemble papers for the survey "Harnessing Multiple Large Language Models: A Survey on LLM Ensemble"☆144Updated last week
- 🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆265Updated last month
- SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis☆67Updated 2 months ago
- "what, how, where, and how well? a survey on test-time scaling in large language models" repository☆71Updated this week
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.☆158Updated 3 weeks ago
- ☆333Updated 2 months ago
- A research repo for experiments about Reinforcement Finetuning☆52Updated 6 months ago
- Latest Advances on Long Chain-of-Thought Reasoning☆523Updated 2 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆345Updated 3 months ago
- Generative AI Act II: Test Time Scaling Drives Cognition Engineering☆207Updated 5 months ago
- RFTT: Reasoning with Reinforced Functional Token Tuning☆29Updated 4 months ago
- Extrapolating RLVR to General Domains without Verifiers☆173Updated 2 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.☆237Updated 2 weeks ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆86Updated 8 months ago