RL algorithm: Advantage induced policy alignment
☆66Aug 11, 2023Updated 2 years ago
Alternatives and similar repositories for RLHF-APA
Users that are interested in RLHF-APA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆98May 30, 2023Updated 2 years ago
- Data and code for the paper "The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems"☆21Jul 18, 2023Updated 2 years ago
- ☆19Jun 3, 2023Updated 2 years ago
- ☆10Jun 11, 2019Updated 6 years ago
- ☆34Oct 31, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation (ICLR2023)☆14Feb 3, 2023Updated 3 years ago
- Self-Alignment with Principle-Following Reward Models☆170Sep 18, 2025Updated 6 months ago
- Code to accompany the paper "The Information Geometry of Unsupervised Reinforcement Learning"☆20Oct 6, 2021Updated 4 years ago
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆33Jan 9, 2025Updated last year
- ☆22Aug 30, 2021Updated 4 years ago
- ☆35Jan 29, 2023Updated 3 years ago
- Repository for the paper Stream of Search: Learning to Search in Language☆154Feb 3, 2025Updated last year
- This repositorie es the code of the paper Optimizing Reusable Knowledge for Continual Learning via Metalearning.☆11Oct 12, 2021Updated 4 years ago
- Implementation of MixCE method described in ACL 2023 paper by Zhang et al.☆20May 29, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators☆53Dec 23, 2025Updated 3 months ago
- DPO, but faster 🚀☆51Dec 6, 2024Updated last year
- GenRM-CoT: Data release for verification rationales☆67Oct 16, 2024Updated last year
- ☆124Feb 21, 2025Updated last year
- The rule-based evaluation subset and code implementation of Omni-MATH☆27Dec 23, 2024Updated last year
- ☆14Oct 11, 2023Updated 2 years ago
- Official implementation of TBA for async LLM post-training.☆29Nov 5, 2025Updated 5 months ago
- The contrastive token loss function for reducing generative repetition of autoregressive neural language models.☆13May 11, 2022Updated 3 years ago
- UniPrompt provides a unified interface to prompt optimization. We have distilled common functions from different algorithms and provide a…☆19May 20, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆68Mar 5, 2026Updated last month
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Jul 12, 2023Updated 2 years ago
- ☆13Oct 23, 2018Updated 7 years ago
- Image-based gridworld experiment for learning Markov state abstractions☆21Sep 16, 2024Updated last year
- Code for "The Expressive Power of Low-Rank Adaptation".☆20Apr 19, 2024Updated last year
- Accelerating the development of large multimodal models (LMMs) with lmms-eval☆14Oct 14, 2024Updated last year
- This is a project using Pytorch to fulfill reinforcement learning on a simple game - Gridworld☆13Jul 13, 2020Updated 5 years ago
- ☆14Mar 5, 2024Updated 2 years ago
- ☆28Jul 23, 2025Updated 8 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Multipack distributed sampler for fast padding-free training of LLMs☆207Aug 10, 2024Updated last year
- A variation on a standard Decision Tree such as that in sklearn, where nodes may be based on an aggregation of multiple splits.☆10May 24, 2024Updated last year
- Training GPTs to solve interaction nets☆18Aug 14, 2024Updated last year
- Code Release for Task Agnostic Dynamics Priors for Deep Reinforcement Learning☆12Jun 13, 2019Updated 6 years ago
- A simple baseline for mountain-car @ gym☆11Jan 15, 2020Updated 6 years ago
- Codes to generate a bandgap database using ChemDataExtractor.☆10Jun 4, 2025Updated 10 months ago
- Implementation of a Hierarchical Mamba as described in the paper: "Hierarchical State Space Models for Continuous Sequence-to-Sequence Mo…☆15Nov 11, 2024Updated last year