abaheti95 / LoL-RL
Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients
☆24Updated last week
Related projects: ⓘ
- ☆65Updated 2 months ago
- Dateset Reset Policy Optimization☆27Updated 5 months ago
- Rewarded soups official implementation☆43Updated 11 months ago
- ☆23Updated 4 months ago
- ☆24Updated 2 weeks ago
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆35Updated 8 months ago
- ☆23Updated 10 months ago
- Repository for Skill Set Optimization☆12Updated last month
- Code for LaMPP: Language Models as Probabilistic Priors for Perception and Action☆35Updated last year
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆33Updated last month
- Directional Preference Alignment☆44Updated 3 months ago
- Self-Supervised Alignment with Mutual Information☆12Updated 3 months ago
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆84Updated 10 months ago
- ☆26Updated last year
- ☆22Updated 10 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆24Updated 5 months ago
- Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)☆37Updated last month
- ☆87Updated 2 months ago
- Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"☆38Updated last month
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆40Updated 8 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆84Updated 5 months ago
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆45Updated 3 months ago
- ☆40Updated last year
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆35Updated last month
- Official implementation of FIND (NeurIPS '23) Function Interpretation Benchmark and Automated Interpretability Agents☆42Updated 5 months ago
- Official code for the paper "Context-Aware Language Modeling for Goal-Oriented Dialogue Systems"☆34Updated last year
- Is In-Context Learning Sufficient for Instruction Following in LLMs?☆19Updated 3 months ago
- ☆15Updated this week
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆78Updated last week
- LLM Dynamic Planner - Combining LLM with PDDL Planners to solve an embodied task☆33Updated last week