ctlllll / reward_collapseLinks

☆27

Alternatives and similar repositories for reward_collapse

Users that are interested in reward_collapse are comparing it to the libraries listed below

Sorting:

allenai / easy-to-hard-generalization
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Updated last year
yidingjiang / ado
The repository contains code for Adaptive Data Optimization
☆26Updated 10 months ago
limenlp / safer-instruct
This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"
☆17Updated last year
ylsung / vl-merging
PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"
☆37Updated 2 years ago
gregorbachmann / Next-Token-Failures
☆103Updated last year
martin-wey / CodeUltraFeedback
CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)
☆71Updated last year
casmlab / NPHardEval
Repository for NPHardEval, a quantified-dynamic benchmark of LLMs
☆59Updated last year
maszhongming / ParaKnowTransfer
Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"
☆32Updated last year
r-three / RAD
Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
☆43Updated 3 weeks ago
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆48Updated 7 months ago
abhishekpanigrahi1996 / transformer_in_transformer
☆45Updated 2 years ago
katiekang1998 / reasoning_generalization
☆33Updated 9 months ago
zhangir-azerbayev / MetaMath
☆11Updated 2 years ago
cassidylaidlaw / orpo
☆19Updated 11 months ago
open-compass / GPassK
[ACL 2025] Are Your LLMs Capable of Stable Reasoning?
☆30Updated 2 months ago
hughbzhang / o1_inference_scaling_laws
Replicating O1 inference-time scaling laws
☆90Updated 10 months ago
RLHFlow / Directional-Preference-Alignment
Directional Preference Alignment
☆57Updated last year
lingo-mit / lm-truthfulness
☆17Updated last year
EleutherAI / pile_dedupe
Pile Deduplication Code
☆19Updated 2 years ago
princeton-nlp / PTP
Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073
☆31Updated last year
feyzaakyurek / rl4f
Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.
☆64Updated 11 months ago
hkust-nlp / llm-compression-intelligence
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆142Updated last year
formll / resolving-scaling-law-discrepancies
☆20Updated last year
janphilippfranken / sami
Self-Supervised Alignment with Mutual Information
☆21Updated last year
Pranjal2041 / AdaptiveConsistency
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs
☆39Updated last year
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 6 months ago
safety-research / SHADE-Arena
☆19Updated 4 months ago
matchten / LoRA-Models-for-SAEs
Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"
☆17Updated 6 months ago
yikangshen / megablocks
☆20Updated last year