RL algorithm: Advantage induced policy alignment
☆66Aug 11, 2023Updated 2 years ago
Alternatives and similar repositories for RLHF-APA
Users that are interested in RLHF-APA are comparing it to the libraries listed below
Sorting:
- ☆282Jan 6, 2025Updated last year
- Direct preference optimization with f-divergences.☆16Nov 3, 2024Updated last year
- [Remote Sensing 2022] PGNet: Positioning Guidance Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Images☆13Dec 9, 2022Updated 3 years ago
- [TIP 2025] This is an official PyTorch implementation of "Zero-Shot Skeleton-Based Action Recognition With Prototype-Guided Feature Align…☆34Jul 24, 2025Updated 7 months ago
- [IEEE TBD 2023] IEMask R-CNN: Information-enhanced Mask R-CNN☆16Mar 14, 2023Updated 3 years ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Jun 10, 2024Updated last year
- A multi-threaded C++ implementation of Nickel & Kiela's "Poincare Embeddings" paper from NIPS 2017, following the implementation of the a…☆17Jun 6, 2018Updated 7 years ago
- ☆34Oct 31, 2024Updated last year
- A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation (ICLR2023)☆14Feb 3, 2023Updated 3 years ago
- Self-Alignment with Principle-Following Reward Models☆170Sep 18, 2025Updated 6 months ago
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆33Jan 9, 2025Updated last year
- Code and data for "Improving Temporal Generalization of Pre-trained Language Models with Lexical Semantic Change" (EMNLP2022)☆18Dec 8, 2022Updated 3 years ago
- ☆22Aug 30, 2021Updated 4 years ago
- GPT* - Training faster small transformers using ALiBi, Parallel Residual Connections and more!☆21Oct 29, 2022Updated 3 years ago
- DPO, but faster 🚀☆48Dec 6, 2024Updated last year
- GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators☆50Dec 23, 2025Updated 2 months ago
- This repositorie es the code of the paper Optimizing Reusable Knowledge for Continual Learning via Metalearning.☆11Oct 12, 2021Updated 4 years ago
- Implementation of MixCE method described in ACL 2023 paper by Zhang et al.☆20May 29, 2023Updated 2 years ago
- [IEEE TCYB 2024] CTNet: Contrastive Transformer Network for Polyp Segmentation☆41Mar 19, 2025Updated last year
- GenRM-CoT: Data release for verification rationales☆67Oct 16, 2024Updated last year
- [IEEE TIP 2025] Cross-domain Few-shot Medical Image Segmentation via Dynamic Semantic Matching☆14Dec 23, 2025Updated 2 months ago
- Code for Posterior Sampling for Deep Reinforcement Learning, ICML 2023☆28Mar 7, 2024Updated 2 years ago
- [ICML 2024] Code for the paper "MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts"☆10Jul 1, 2024Updated last year
- The rule-based evaluation subset and code implementation of Omni-MATH☆27Dec 23, 2024Updated last year
- ☆14Oct 11, 2023Updated 2 years ago
- Official implementation of TBA for async LLM post-training.☆29Nov 5, 2025Updated 4 months ago
- [NIPS 2025] Open-World Drone Active Tracking with Goal-Centered Rewards☆17Nov 3, 2025Updated 4 months ago
- The contrastive token loss function for reducing generative repetition of autoregressive neural language models.☆13May 11, 2022Updated 3 years ago
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆68Mar 5, 2026Updated 2 weeks ago
- Image-based gridworld experiment for learning Markov state abstractions☆21Sep 16, 2024Updated last year
- ModelSoups for Tensorflow2 and Torch☆50Apr 27, 2022Updated 3 years ago
- The official source code for "Boosting LLM Agents with Recursive Contemplation for Effective Deception Handling" (ACL 2024, Findings)☆14Aug 12, 2024Updated last year
- ☆14Mar 5, 2024Updated 2 years ago
- Code for "The Expressive Power of Low-Rank Adaptation".☆20Apr 19, 2024Updated last year
- ☆27Jul 23, 2025Updated 7 months ago
- Multipack distributed sampler for fast padding-free training of LLMs☆206Aug 10, 2024Updated last year
- Code for [NeurIPS'2019 Spotlight] Policy Continuation with Hindsight Inverse Dynamics☆15Jan 7, 2020Updated 6 years ago
- A simple baseline for mountain-car @ gym☆11Jan 15, 2020Updated 6 years ago
- Codes to generate a bandgap database using ChemDataExtractor.☆10Jun 4, 2025Updated 9 months ago