liziniu/GEM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/liziniu/GEM)

liziniu / GEM

Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)

☆58

Alternatives and similar repositories for GEM

Users that are interested in GEM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

liziniu / cold_start_rl
View on GitHub
Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?
☆20Mar 9, 2025Updated last year
tangzhy / RealCritic
View on GitHub
☆15Jan 27, 2025Updated last year
liziniu / policy_optimization
View on GitHub
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
☆29Dec 19, 2023Updated 2 years ago
Shentao-YANG / Preference_Grounded_Guidance
View on GitHub
Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).
☆17Jan 8, 2025Updated last year
ernie-research / CD-RLHF
View on GitHub
[ACL'25] Official code of curiosity-driven RLHF
☆16Jun 22, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
liziniu / ReMax
View on GitHub
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
☆202Dec 16, 2023Updated 2 years ago
TheRoadQaQ / ReLIFT
View on GitHub
Official Repository of "Learning what reinforcement learning can't"
☆85Dec 30, 2025Updated 6 months ago
FreedomIntelligence / MyPhoneBench
View on GitHub
MyPhoneBench: Do Phone-Use Agents Respect Your Privacy?
☆24Apr 3, 2026Updated 3 months ago
ZhangXJ199 / EDGE-GRPO
View on GitHub
Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity
☆22Aug 28, 2025Updated 10 months ago
zwhong714 / PSFT
View on GitHub
[ICLR 2026] PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, co…
☆38Sep 9, 2025Updated 10 months ago
limenlp / verl
View on GitHub
AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
☆56Jun 13, 2025Updated last year
ChengpengLi1003 / CoRT
View on GitHub
☆72Oct 23, 2025Updated 9 months ago
ars22 / scaling-LLM-math-synthetic-data
View on GitHub
Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"
☆32Jun 16, 2024Updated 2 years ago
AndreHe02 / rewarding-unlikely-release
View on GitHub
☆15Jun 10, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
XiangLi1999 / AutoBencher
View on GitHub
☆33Jul 11, 2024Updated 2 years ago
hrtan / MoSo
View on GitHub
[NeurIPS-2023] The PyTorch Implementation of MoSo. The algorithms are based on our paper: "Data Pruning via Moving-one-Sample-out". MoSo …
☆10May 21, 2026Updated 2 months ago
sunblaze-ucb / omega
View on GitHub
☆47Jun 24, 2025Updated last year
allenai / easy-to-hard-generalization
View on GitHub
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Jan 17, 2024Updated 2 years ago
EnnengYang / An-Efficient-Dataset-Condensation-Plugin
View on GitHub
An Efficient Dataset Condensation Plugin and Its Application to Continual Learning. NeurIPS, 2023.
☆12Nov 29, 2023Updated 2 years ago
ZitongYang / Synthetic_Continued_Pretraining
View on GitHub
Code implementation of synthetic continued pretraining
☆162Jan 6, 2025Updated last year
Zanette-Labs / speed-rl
View on GitHub
☆18Feb 2, 2026Updated 5 months ago
princeton-nlp / unintentional-unalignment
View on GitHub
[ICLR 2025] Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
☆32Jan 7, 2026Updated 6 months ago
TsinghuaC3I / Intuitive-Fine-Tuning
View on GitHub
[ACL 2025, Main Conference, Oral] Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process
☆30Aug 2, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
MasterVito / DAC-RL
View on GitHub
Official Repo for DAC-RL: Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability
☆16Feb 26, 2026Updated 5 months ago
ChiyuSONG / dynamics-of-instruction-tuning
View on GitHub
☆18Mar 3, 2025Updated last year
DAMO-NLP-SG / MT-LLaMA
View on GitHub
Multi-Task instruction-tuned LLaMA
☆14May 5, 2023Updated 3 years ago
QingyangZhang / TEMPO
View on GitHub
Scaling Test-time Training for LLM Reasoning
☆27Apr 14, 2026Updated 3 months ago
TianyunYoung / Hallucination-Attribution
View on GitHub
This repo contains the code for the paper "Understanding and Mitigating Hallucinations in Large Vision-Language Models via Modular Attrib…
☆39Jul 14, 2025Updated last year
Infini-AI-Lab / GRESO
View on GitHub
☆82Jun 8, 2026Updated last month
Algorithmic-Alignment-Lab / CommonClaim
View on GitHub
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
☆15Jun 21, 2023Updated 3 years ago
BiEchi / DistributedTrainingGPT2
View on GitHub
基于PyTorch GPT-2的针对各种数据并行pretrain的研究代码.
☆11Dec 16, 2022Updated 3 years ago
huanranchen / LLMLandscape
View on GitHub
The loss landscape of Large Language Models resemble basin!
☆41Jul 8, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
TianHongZXY / RLVR-Decomposed
View on GitHub
[NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"
☆166Mar 2, 2026Updated 4 months ago
facebookresearch / rlfh-gen-div
View on GitHub
This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
☆50Jan 19, 2024Updated 2 years ago
wzhouad / WPO
View on GitHub
Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"
☆41Sep 24, 2024Updated last year
Simplified-Reasoning / LUFFY
View on GitHub
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆460Mar 20, 2026Updated 4 months ago
yidingjiang / ado
View on GitHub
The repository contains code for Adaptive Data Optimization
☆37Dec 9, 2024Updated last year
GAIR-NLP / OctoThinker
View on GitHub
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆189Jul 23, 2025Updated last year
rohinmanvi / Capability-Aware-and-Mid-Generation-Self-Evaluations
View on GitHub
☆21Jul 25, 2025Updated last year