tlc4418 / llm_optimization
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
☆26Updated 6 months ago
Related projects: ⓘ
- ☆30Updated 7 months ago
- Rewarded soups official implementation☆43Updated 11 months ago
- Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).☆12Updated 10 months ago
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆23Updated 9 months ago
- ☆65Updated 2 months ago
- ☆15Updated last week
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆78Updated last week
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆35Updated 8 months ago
- ☆75Updated last month
- Align your LM to express calibrated verbal statements of confidence in its long-form generations.☆16Updated 3 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆84Updated 5 months ago
- Implements the Messenger environment and EMMA model.☆22Updated last year
- Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvature☆93Updated last month
- ☆26Updated 5 months ago
- Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implici…☆94Updated 10 months ago
- The code of paper "Toward Optimal LLM Alignments Using Two-Player Games".☆11Updated 3 months ago
- ☆25Updated 10 months ago
- ☆69Updated 10 months ago
- Teaching Models to Express Their Uncertainty in Words☆36Updated 2 years ago
- [ACL'2024, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!☆27Updated last month
- ☆12Updated 3 years ago
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆84Updated 10 months ago
- ☆68Updated 2 months ago
- Minimal but scalable implementation of large language models in JAX☆17Updated 3 weeks ago
- ☆24Updated 2 weeks ago
- NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformers☆33Updated last month
- Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)☆37Updated 2 months ago
- ☆56Updated last year
- Direct preference optimization with f-divergences.☆11Updated last week
- ☆117Updated 8 months ago