Dateset Reset Policy Optimization
☆31Apr 12, 2024Updated last year
Alternatives and similar repositories for drpo
Users that are interested in drpo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Reinforcement Learning via Regressing Relative Rewards☆40Dec 12, 2024Updated last year
- ☆13Jul 2, 2025Updated 8 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- ☆131Feb 6, 2024Updated 2 years ago
- Extensive Self-Contrast Enables Feedback-Free Language Model Alignment☆20Apr 2, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆16Oct 5, 2021Updated 4 years ago
- ☆16Jul 23, 2024Updated last year
- ☆13Jun 3, 2022Updated 3 years ago
- Open source code combining implementations of Upside Down Reinforcement Learning and Reward Conditioned Policies☆19Mar 10, 2021Updated 5 years ago
- Representation Learning in RL☆13Jun 1, 2022Updated 3 years ago
- ☆160Nov 23, 2024Updated last year
- Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"☆69Feb 27, 2024Updated 2 years ago
- ☆35Sep 14, 2024Updated last year
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- ☆13Jan 22, 2025Updated last year
- ☆21Sep 5, 2023Updated 2 years ago
- ☆99Jun 27, 2024Updated last year
- Expression Snippet Transformer for Robust Video-based Facial Expression Recognition☆17Jan 27, 2024Updated 2 years ago
- ☆15Apr 11, 2024Updated last year
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study☆59Nov 24, 2024Updated last year
- From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.☆25Oct 7, 2025Updated 5 months ago
- Reinforcement Learning via Latent State Decoding☆29Jun 12, 2023Updated 2 years ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆53Jun 24, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆22Sep 2, 2025Updated 6 months ago
- 😎 基于知识的文本生成相关文章总结与个人笔记☆21Oct 5, 2024Updated last year
- VC-FB and MC-FB algorithms from "Zero-Shot Reinforcement Learning from Low Quality Data" (NeurIPS 2024)☆27Jan 14, 2025Updated last year
- ☆14Jan 24, 2025Updated last year
- Vintix: Action Model via In-Context Reinforcement Learning - - — ICML 2025☆45May 23, 2025Updated 10 months ago
- LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba (Official Implementation)☆17Oct 24, 2024Updated last year
- 🚀 Sliding Window Attention Training for Efficient Large Language Models☆16Dec 8, 2025Updated 3 months ago
- Reinforcement learning algorithm implementation☆10Oct 31, 2021Updated 4 years ago
- Simple Conversational Data Augmentation for Semi-supervised Abstractive Conversation Summarization☆10Mar 7, 2022Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning☆35Aug 9, 2023Updated 2 years ago
- ☆12Oct 19, 2020Updated 5 years ago
- Unofficial implementation of AlpaGasus☆95Sep 23, 2023Updated 2 years ago
- ☆30Jun 10, 2020Updated 5 years ago
- An open source deep learning library for Unity.☆17Mar 15, 2026Updated last week
- Robust policy search algorithms which train on model ensembles☆30Oct 26, 2016Updated 9 years ago
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year