Xuekai-Zhu / key-configuration-of-llmsLinks

☆23

Alternatives and similar repositories for key-configuration-of-llms

Users that are interested in key-configuration-of-llms are comparing it to the libraries listed below

Sorting:

multimodal-art-projection / KORGym
☆51Updated 4 months ago
louieworth / awesome-rlhf
An index of algorithms for reinforcement learning from human feedback (rlhf))
☆92Updated last year
PKU-Alignment / aligner
[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
☆186Updated 9 months ago
Zhou-Zoey / RMB-Reward-Model-Benchmark
☆42Updated 6 months ago
FreedomIntelligence / OVM
☆69Updated last year
hanningzhang / prm
☆17Updated 11 months ago
Vance0124 / Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆148Updated 8 months ago
ValueCompass / Alignment-Goal-Survey
☆29Updated last year
RUCAIBox / CARP
☆17Updated 2 years ago
genrm-star / genrm-critiques
GenRM-CoT: Data release for verification rationales
☆66Updated last year
PRIME-RL / Entropy-Mechanism-of-RL
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆345Updated 3 months ago
ZHZisZZ / modpo
[ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
☆89Updated last year
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆123Updated last year
LLaMafia / SFT_function_learning
Explore what LLMs are really leanring over SFT
☆29Updated last year
lancopku / label-words-are-anchors
Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
☆165Updated last year
ssmisya / PRMBench
[ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.
☆81Updated 8 months ago
SparkJiao / dpo-trajectory-reasoning
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆82Updated 9 months ago
GAIR-NLP / alignment-for-honesty
☆75Updated last year
liziniu / ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
☆193Updated last year
TianHongZXY / RLVR-Decomposed
[NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"
☆112Updated last month
RZFan525 / Awesome-ScalingLaws
A curated list of awesome resources dedicated to Scaling Laws for LLMs
☆79Updated 2 years ago
CJReinforce / PURE
Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"
☆137Updated 3 months ago
sanowl / Self-Correcting-LLM--Reinforcement-Learning-
This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…
☆37Updated 3 months ago
Freder-chen / ReasonGenRM
A simple implementation of ReasonGenRM.
☆17Updated 5 months ago
xufangzhi / ENVISIONS
[ACL 2025] A Neural-Symbolic Self-Training Framework
☆115Updated 4 months ago
RyanLiu112 / Awesome-Process-Reward-Models
A comprehensive collection of process reward models.
☆111Updated 2 weeks ago
OpenMOSS / Say-I-Dont-Know
[ICML'2024] Can AI Assistants Know What They Don't Know?
☆83Updated last year
RLHFlow / RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…
☆37Updated last year
WooooDyy / MathCritique
Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".
☆56Updated 10 months ago
RenShuhuai-Andy / my-tools
my commonly-used tools
☆61Updated 9 months ago