holarissun / embedding-based-llm-alignmentLinks

Codebase for Paper Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs

☆17

Alternatives and similar repositories for embedding-based-llm-alignment

Users that are interested in embedding-based-llm-alignment are comparing it to the libraries listed below

Sorting:

holarissun / RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…
☆64Updated 3 months ago
sylinrl / CalibratedMath
Teaching Models to Express Their Uncertainty in Words
☆39Updated 3 years ago
Shentao-YANG / Preference_Grounded_Guidance
Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).
☆16Updated 6 months ago
zyxnlp / ICL-Interpretation-Analysis-Resources
Links to publications that focus on the interpretation and analysis of in-context learning
☆11Updated 9 months ago
BunsenFeng / PoliLean
Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Mod…
☆37Updated last year
2003pro / ScaleBiO
This is the official implementation of ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
☆19Updated 11 months ago
Linear95 / APO
Code for ACL2024 paper - Adversarial Preference Optimization (APO).
☆55Updated last year
RLHFlow / RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…
☆33Updated 9 months ago
abhishekpanigrahi1996 / Skill-Localization-by-grafting
☆49Updated last year
louieworth / awesome-rlhf
An index of algorithms for reinforcement learning from human feedback (rlhf))
☆92Updated last year
Linear95 / DSP
Domain-specific preference (DSP) data and customized RM fine-tuning.
☆25Updated last year
junkangwu / beta-DPO
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
☆45Updated 8 months ago
chujiezheng / LLM-MCQ-Bias
Official repository for ICLR 2024 Spotlight paper "Large Language Models Are Not Robust Multiple Choice Selectors"
☆39Updated last month
activatedgeek / calibration-tuning
☆51Updated 3 months ago
deeplearning-wisc / args
☆43Updated last year
liziniu / GEM
Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)
☆33Updated 2 months ago
ADaM-BJTU / W2SG
The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”
☆17Updated last year
google-research-datasets / GSM-IC
Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…
☆60Updated 2 years ago
Zhou-Zoey / RMB-Reward-Model-Benchmark
☆41Updated 3 months ago
alestolfo / lm-arithmetic
Code for the paper "A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis"
☆18Updated last month
ars22 / scaling-LLM-math-synthetic-data
Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"
☆30Updated last year
ruizheng20 / gpo
The code of paper "Toward Optimal LLM Alignments Using Two-Player Games".
☆17Updated last year
RZFan525 / Awesome-ScalingLaws
A curated list of awesome resources dedicated to Scaling Laws for LLMs
☆75Updated 2 years ago
LLaMafia / SFT_function_learning
Explore what LLMs are really leanring over SFT
☆28Updated last year
LiuAmber / RAHF
[ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…
☆25Updated 9 months ago
icip-cas / Verifier-Engineering
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
☆61Updated 7 months ago
tatsu-lab / test_set_contamination
☆38Updated last year
SuperBruceJia / Awesome-LLM-Self-Consistency
Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models
☆101Updated 11 months ago
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆123Updated 10 months ago
FranxYao / Retrieval-Head-with-Flash-Attention
Efficient retrieval head analysis with triton flash attention that supports topK probability
☆13Updated last year