papers related to Direct Preference Optimization(DPO)
☆20Jul 16, 2024Updated last year
Alternatives and similar repositories for awesome-DPO
Users that are interested in awesome-DPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆29Apr 28, 2026Updated last month
- Interpreting Learned Search and Planning: Reverse-engineering recurrent convolutional networks (DRC) that play Sokoban☆20Jun 29, 2025Updated 11 months ago
- ☆10May 16, 2021Updated 5 years ago
- [ICML 2023] Meta-SAGE: Scale Meta-Learning Scheduled Adaptation with Guided Exploration for Mitigating Scale Shift on Combinatorial Optim…☆10Dec 19, 2023Updated 2 years ago
- ☆35Jul 2, 2025Updated 11 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆13Jul 15, 2024Updated last year
- ☆13Sep 24, 2023Updated 2 years ago
- latex notes w/ neovim☆14Apr 18, 2025Updated last year
- 🔥🔥🔥Latest Papers, Codes on Uncertainty-based RL☆59Aug 24, 2025Updated 9 months ago
- Estimating neural network runtime characteristics☆12Mar 25, 2023Updated 3 years ago
- ☆12Sep 11, 2022Updated 3 years ago
- An open-source server implementation for inference Qwen2-VL series model using fastapi.☆10Nov 20, 2024Updated last year
- The official implementation of Preference Data Reward-Augmentation.☆18May 1, 2025Updated last year
- From-Classification-to-Clinical☆13Apr 26, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆16May 22, 2025Updated last year
- Data for evaluating GPT-4V☆11Oct 26, 2023Updated 2 years ago
- Algebraic value editing in pretrained language models☆70Nov 1, 2023Updated 2 years ago
- A Pytorch implementation of Pensieve (SIGCOMM'18)☆12Jun 17, 2020Updated 5 years ago
- A list of papers regarding generalization in (deep) reinforcement learning☆11Aug 13, 2023Updated 2 years ago
- Codebase for Paper Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs☆22Apr 24, 2025Updated last year
- 《多模态大模型部署微调指南》快速部署/微调多模态大模型☆14Dec 4, 2024Updated last year
- Record experiment data easily☆14Aug 13, 2022Updated 3 years ago
- FairGAN: GANs-based Fairness-aware Learning for Recommendations with Implicit Feedback☆15Oct 8, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆18Oct 8, 2024Updated last year
- Models, data, and codes for the paper: MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models☆25Sep 26, 2024Updated last year
- ☆17Jun 30, 2020Updated 5 years ago
- [EOL] Process Polygon Package to DOMjudge Package.☆12Apr 24, 2021Updated 5 years ago
- The datasets of TSAD☆17Oct 20, 2025Updated 7 months ago
- helpers for working with maven via emacs (derives from ant-el)☆17Jun 1, 2021Updated 5 years ago
- The collection of related papers and resources for the paper Time Series Analysis for Education: Methods, Applications, and Future Direct…☆20Apr 12, 2025Updated last year
- All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks☆17Apr 24, 2024Updated 2 years ago
- [SIGIR'25] Code of "Generative Recommender with End-to-End Learnable Item Tokenization".☆32Apr 17, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [ICLR 2026] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆46May 20, 2025Updated last year
- "AI Commit Message Tool uses AI to automatically generate concise and professional Git commit messages, which you can then edit and confi…☆14Jul 14, 2025Updated 10 months ago
- LLMAD code☆30Oct 31, 2024Updated last year
- ☆17Aug 1, 2025Updated 10 months ago
- GPU-based Massively Parallel Environments for Large-Scale Combinatorial Optimization (CO) Problems Using Reinforcement Learning☆31Mar 16, 2026Updated 2 months ago
- CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior☆13Sep 21, 2022Updated 3 years ago
- ☆29Jul 16, 2024Updated last year