LAMDASZ-ML / ChinaTravel
☆10Updated 2 weeks ago
Alternatives and similar repositories for ChinaTravel:
Users that are interested in ChinaTravel are comparing it to the libraries listed below
- Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"☆28Updated last month
- Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"☆29Updated last year
- ✨✨Latest Advances on Neuro-Symbolic Learning in the era of Large Language Models☆55Updated last week
- Neuro-Symbolic Hierarchical Rule Induction☆12Updated 2 years ago
- A minimal example of Abductive Learning☆13Updated last year
- An efficient Python toolkit for Abductive Learning (ABL), a novel paradigm that integrates machine learning and logical reasoning in a un…☆59Updated 6 months ago
- Rewarded soups official implementation☆56Updated last year
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆37Updated 3 weeks ago
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆28Updated last year
- Implementation of A Context-Integrated Transformer-Based Neural Network for Auction Design (ICML2022).☆16Updated 2 years ago
- What Makes a Reward Model a Good Teacher? An Optimization Perspective☆15Updated last week
- The code for our NeurIPS 2021 paper "Kernelized Heterogeneous Risk Minimization".☆12Updated 3 years ago
- ☆16Updated last year
- Survey on Robust Weakly Supervised Learning☆13Updated 3 years ago
- Code for the paper: "Causal Influence Detection for Improving Efficiency in Reinforcement Learning", by Seitzer, M., Schölkopf, B., Marti…☆38Updated 3 years ago
- ☆30Updated 5 months ago
- A curated paper list on neural symbolic and probabilistic logic.☆123Updated last year
- RLA is a tool for managing your RL experiments automatically☆27Updated 2 months ago
- [ACL'24, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!☆34Updated 8 months ago
- Direct preference optimization with f-divergences.☆13Updated 4 months ago
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆35Updated 2 months ago
- code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning☆39Updated last year
- Preprint: Asymmetry in Low-Rank Adapters of Foundation Models☆35Updated last year
- Representation Learning in RL☆16Updated 2 years ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆72Updated 7 months ago
- Implemention of the Decision-Pretrained Transformer (DPT) from the paper Supervised Pretraining Can Learn In-Context Reinforcement Learni…☆60Updated 10 months ago
- ☆30Updated 2 years ago
- A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.☆41Updated 2 months ago
- Implementation of ICML 2023 paper: Future-conditioned Unsupervised Pretraining for Decision Transformer☆27Updated last year
- ☆30Updated 5 months ago