sail-sg / diceLinks

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

☆44

Alternatives and similar repositories for dice

Users that are interested in dice are comparing it to the libraries listed below

Sorting:

RLHFlow / Directional-Preference-Alignment
Directional Preference Alignment
☆59Updated 10 months ago
PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆92Updated last week
hbin0701 / Self-Explore
[𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…
☆51Updated last year
chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆75Updated 2 months ago
princeton-nlp / Edge-Pruning
[NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
☆59Updated last week
SynthLabsAI / big-math
A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
☆59Updated 5 months ago
liziniu / GEM
Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)
☆35Updated 2 months ago
Leooyii / LCEG
Long Context Extension and Generalization in LLMs
☆58Updated 10 months ago
facebookresearch / RLCD
Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment
☆69Updated last year
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆52Updated 5 months ago
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆47Updated 5 months ago
janphilippfranken / sami
Self-Supervised Alignment with Mutual Information
☆21Updated last year
sail-sg / AnytimeReasoner
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆43Updated 3 weeks ago
sail-sg / VeriFree
Reinforcing General Reasoning without Verifiers
☆76Updated last month
ryoungj / BoLT
Code for "Reasoning to Learn from Latent Thoughts"
☆114Updated 4 months ago
sail-sg / SkyLadder
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆33Updated last week
facebookresearch / rlfh-gen-div
This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
☆44Updated last year
haozheji / exact-optimization
ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment
☆58Updated last year
holarissun / RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…
☆66Updated 4 months ago
RLHFlow / RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…
☆33Updated 10 months ago
Jiuzhouh / Uncertainty-Aware-Language-Agent
This is the official repo for Towards Uncertainty-Aware Language Agent.
☆26Updated 11 months ago
KbsdJames / omni-math-rule
The rule-based evaluation subset and code implementation of Omni-MATH
☆22Updated 7 months ago
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆63Updated 7 months ago
sail-sg / ActivePRM
☆17Updated 3 months ago
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆123Updated 10 months ago
test-time-interaction / TTI
☆52Updated last month
googleinterns / localizing-paragraph-memorization
☆14Updated last year
hkust-nlp / B-STaR
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆82Updated 2 months ago
GAIR-NLP / weak-to-strong-reasoning
☆59Updated 11 months ago
RUCAIBox / RLMEC
The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"
☆38Updated last year