SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters (ICLR 2025)
☆17Aug 22, 2025Updated 6 months ago
Alternatives and similar repositories for SimPER
Users that are interested in SimPER are comparing it to the libraries listed below
Sorting:
- Companion code to https://arxiv.org/abs/2409.03797v2☆19Sep 18, 2025Updated 5 months ago
- Code for the paper: Dense Reward for Free in Reinforcement Learning from Human Feedback (ICML 2024) by Alex J. Chan, Hao Sun, Samuel Holt…☆38Aug 11, 2024Updated last year
- ☆34Nov 18, 2025Updated 3 months ago
- Variational Walkback, NIPS'17☆28Oct 18, 2017Updated 8 years ago
- Implementation of the models and datasets used in "An Information-theoretic Approach to Distribution Shifts"☆25Nov 2, 2021Updated 4 years ago
- Agent Skill Induction: "Inducing Programmatic Skills for Agentic Tasks"☆38Apr 24, 2025Updated 10 months ago
- [ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)☆64Feb 19, 2026Updated 2 weeks ago
- Repo for the EACL2017 tutorial on imitation learning☆28Apr 3, 2017Updated 8 years ago
- Implementation of PCA algorithm using Gram-Scmidt modification on NIPALS☆10Jun 13, 2015Updated 10 years ago
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated last year
- Book: Practical Probabilistic Machine Learning in Python☆10Apr 3, 2021Updated 4 years ago
- Pascal2 Harvest project QuEst☆14Sep 15, 2014Updated 11 years ago
- python3+django+django-rest-framework+vue+xadmin前后端分离电商平台☆10Dec 8, 2022Updated 3 years ago
- Uncovering User Interest from Biased and Noised Watch Time in Video Recommendation. In Recsys23.☆11Jul 18, 2023Updated 2 years ago
- ☆10Nov 15, 2023Updated 2 years ago
- Factoried Personalized Markov Chains for Next Basket Recommendation in R and Python☆13Jan 7, 2018Updated 8 years ago
- ☆10Jul 8, 2021Updated 4 years ago
- Teaching a humanoid to walk(ish), then displaying in your browser (using tensorflow.js and reinforcement learning)☆10Sep 7, 2020Updated 5 years ago
- Reference implementation of algorithms for reinforcement learning and Markov decision processes.☆12Jan 28, 2021Updated 5 years ago
- HPYLMのC++実装☆11May 2, 2017Updated 8 years ago
- Neural machine translation with Recurrent Deterministic Policy Gradient☆10Aug 18, 2016Updated 9 years ago
- ☆10Aug 10, 2017Updated 8 years ago
- ☆10Jul 5, 2016Updated 9 years ago
- Fair Benchmarks☆10Mar 14, 2019Updated 6 years ago
- 🤖 Implementation of Self Normalizing Networks (SNN) in PyTorch.☆12Jun 19, 2017Updated 8 years ago
- Offline Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits☆10Oct 21, 2024Updated last year
- EasyRLHF aims to provide an easy and minimal interface to train aligned language models, using off-the-shelf solutions and datasets☆10Dec 12, 2023Updated 2 years ago
- ☆10Jun 28, 2020Updated 5 years ago
- Query Expansion using word2vec☆11Jul 18, 2019Updated 6 years ago
- Cross-domain word representation learning☆10May 23, 2015Updated 10 years ago
- Code for "Using Embeddings to Correct for Unobserved Confounding"☆10May 31, 2019Updated 6 years ago
- The Tweets2013 Internet Archive collection☆10Aug 7, 2020Updated 5 years ago
- ☆12Feb 9, 2024Updated 2 years ago
- Pairwise Interaction Tensor Factorization☆10Oct 11, 2018Updated 7 years ago
- ☆26Jul 29, 2025Updated 7 months ago
- ⚙️ Lightweight & smart Bun & Browser configuration loader.☆15Updated this week
- FamilyTool benchmark☆12Sep 10, 2025Updated 5 months ago
- ☆13Oct 25, 2019Updated 6 years ago
- Concise Reasoning via Reinforcement Learning☆13Apr 16, 2025Updated 10 months ago