Improving Math reasoning through Direct Preference Optimization with Verifiable Pairs
☆19Mar 20, 2025Updated last year
Alternatives and similar repositories for DPO-VP
Users that are interested in DPO-VP are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS 2025 D&B Track] MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research☆28Sep 23, 2025Updated 7 months ago
- ☆23Jul 5, 2024Updated last year
- Windows ARM64 build for TeX Live☆15Mar 13, 2025Updated last year
- Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning☆15Jun 28, 2025Updated 10 months ago
- Code for paper: Reinforced Vision Perception with Tools☆72Oct 3, 2025Updated 7 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Official code for ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning (AAAI'24)☆17Feb 10, 2024Updated 2 years ago
- ☆30Oct 8, 2025Updated 6 months ago
- CrysText: A Generative AI Approach for Text-Conditioned Crystal Structure Generation using LLM☆17Nov 3, 2025Updated 6 months ago
- [NeurIPS 2025] Official Implementation of paper "Sherlock: Self-Correcting Reasoning in Vision-Language Models"☆29Sep 18, 2025Updated 7 months ago
- A Monte Carlo tree search solver for games with perfect information.☆10Aug 26, 2016Updated 9 years ago
- IPO: Interpretable Prompt Optimization for Vision-Language Models(NeurIPS 2024)☆15Mar 4, 2025Updated last year
- Official website for TIC-VLA☆42Feb 3, 2026Updated 3 months ago
- Autonomous vehicle learn how to navigate efficiently at crossroad☆16Jan 31, 2018Updated 8 years ago
- Self-Teaching Notes on Gradient Leakage Attacks against GPT-2 models.☆14Mar 18, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The official PyTorch implementation of the paper "Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regul…☆15Nov 10, 2024Updated last year
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆193Jan 16, 2025Updated last year
- The official implementation of paper "Overcoming Data and Model heterogeneities in Decentralized Federated Learning via Synthetic Anchors…☆15Jun 14, 2024Updated last year
- ☆16Jun 14, 2023Updated 2 years ago
- Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images☆19Jun 4, 2025Updated 11 months ago
- [WACV 2024] Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining, WACV 2024☆13Jan 3, 2024Updated 2 years ago
- Direct preference optimization with f-divergences.☆16Nov 3, 2024Updated last year
- Reward Guided Latent Consistency Distillation☆27Oct 9, 2024Updated last year
- [ICLR 2025] Data-Augmented Phrase-Level Alignment for Mitigating Object Hallucination☆21Jan 27, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Code for paper: Reward Uncertainty for Exploration in Preference-based Reinforcement Learning☆15May 26, 2022Updated 3 years ago
- The reproduct of the paper - Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction☆22May 29, 2024Updated last year
- ☆13Sep 14, 2023Updated 2 years ago
- Code repo for "Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning"☆33Jul 25, 2025Updated 9 months ago
- ☆17May 1, 2023Updated 3 years ago
- The code of AMoPO: Adaptive Multi-objective Preference Optimization without Rewards and References.☆45Sep 14, 2025Updated 7 months ago
- VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection