holarissun / embedding-based-llm-alignmentLinks
Codebase for Paper Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs
☆22Updated 9 months ago
Alternatives and similar repositories for embedding-based-llm-alignment
Users that are interested in embedding-based-llm-alignment are comparing it to the libraries listed below
Sorting:
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆70Updated 10 months ago
- ☆46Updated 2 years ago
- Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).☆17Updated last year
- ☆52Updated 2 years ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆56Updated last year
- Links to publications that focus on the interpretation and analysis of in-context learning☆15Updated last year
- An index of algorithms for reinforcement learning from human feedback (rlhf))☆92Updated last year
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆124Updated last year
- Official repository for ICLR 2024 Spotlight paper "Large Language Models Are Not Robust Multiple Choice Selectors"☆43Updated 8 months ago
- The code of paper "Toward Optimal LLM Alignments Using Two-Player Games".☆17Updated last year
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆85Updated 11 months ago
- ☆57Updated 8 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆50Updated last year
- ☆47Updated 10 months ago
- ☆62Updated 8 months ago
- TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models☆87Updated 2 years ago
- SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters (ICLR 2025)☆17Updated 5 months ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆39Updated last year
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆57Updated last year
- Teaching Models to Express Their Uncertainty in Words☆39Updated 3 years ago
- Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments (Zhou et al., EMNLP 2024)☆14Updated last year
- A curated list of awesome resources dedicated to Scaling Laws for LLMs☆82Updated 2 years ago
- ☆78Updated last year
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆28Updated 2 years ago
- GenRM-CoT: Data release for verification rationales☆67Updated last year
- [NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"☆61Updated 2 years ago
- [ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…☆28Updated last year
- Offical code of the paper Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Le…☆75Updated last year
- Domain-specific preference (DSP) data and customized RM fine-tuning.☆25Updated last year
- Implementation of self-certainty as an extention of ZeroEval Project☆34Updated 8 months ago