ruizheng20 / gpo
The code of paper "Toward Optimal LLM Alignments Using Two-Player Games".
☆14Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for gpo
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆57Updated 2 weeks ago
- ☆33Updated 9 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆28Updated 4 months ago
- ☆24Updated 6 months ago
- Röttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆64Updated 10 months ago
- ☆49Updated last year
- ☆44Updated 10 months ago
- [ACL 2023 Findings] What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning☆22Updated last year
- ☆38Updated last year
- Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".☆14Updated 3 weeks ago
- Rewarded soups official implementation☆51Updated last year
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)☆21Updated 4 months ago
- Official code for the paper: Evaluating Copyright Takedown Methods for Language Models☆15Updated 4 months ago
- Lightweight Adapting for Black-Box Large Language Models☆18Updated 9 months ago
- ☆36Updated 3 months ago
- Offical code of the paper Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Le…☆68Updated 8 months ago
- Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).☆14Updated last year
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆84Updated 6 months ago
- ☆12Updated 3 months ago
- Code for our paper: "GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models"☆51Updated last year
- The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”☆15Updated 8 months ago
- Restore safety in fine-tuned language models through task arithmetic☆26Updated 7 months ago
- Align your LM to express calibrated verbal statements of confidence in its long-form generations.☆19Updated 5 months ago
- ☆20Updated 5 months ago
- ☆21Updated last month
- ☆31Updated last year
- ☆26Updated 6 months ago
- Teaching Models to Express Their Uncertainty in Words☆36Updated 2 years ago
- Official Repository for Dataset Inference for LLMs☆23Updated 3 months ago