FuRuF-11 / AITLinks

A repository to introduce the algorithmic information theory. You could learn what is Kolmogorov complexity and why it is important here.

☆11

Alternatives and similar repositories for AIT

Users that are interested in AIT are comparing it to the libraries listed below

Sorting:

wiio12 / LEGO-Prover
Code for the paper LEGO-Prover: Neural Theorem Proving with Growing Libraries
☆68Updated last year
YangRui2015 / Generalizable-Reward-Model
Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"
☆43Updated 9 months ago
alexrame / rewardedsoups
Rewarded soups official implementation
☆62Updated 2 years ago
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆182Updated 6 months ago
thinking-machines-lab / tinker-project-ideas
Ideas for projects related to Tinker
☆112Updated last month
holarissun / RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…
☆70Updated 8 months ago
shiqiangw / iclr-scores
☆54Updated last year
louieworth / awesome-rlhf
An index of algorithms for reinforcement learning from human feedback (rlhf))
☆92Updated last year
XanderJC / attention-based-credit
Code for the paper: Dense Reward for Free in Reinforcement Learning from Human Feedback (ICML 2024) by Alex J. Chan, Hao Sun, Samuel Holt…
☆37Updated last year
mansicer / Q-Adapter
Implementation of ICLR 2025 paper "Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation"
☆18Updated last year
Kaffaljidhmah2 / Arxiv-Recommender
☆51Updated 2 years ago
cometeme / funcoder
Implementation for NeurIPS 2024 oral paper: Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation
☆16Updated 10 months ago
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆124Updated last year
abdulhaim / LMRL-Gym
☆106Updated last year
genrm-star / genrm-critiques
GenRM-CoT: Data release for verification rationales
☆66Updated last year
YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆198Updated 7 months ago
Linear95 / DSP
Domain-specific preference (DSP) data and customized RM fine-tuning.
☆25Updated last year
thu-wyz / inference_scaling
☆76Updated last year
ZHZisZZ / modpo
[ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
☆93Updated last year
ttumiel / minRLHF
Minimal RLHF implementation built on top of minGPT.
☆30Updated last year
yuandong-tian / arXiv_recbot
A Telegram bot to recommend arXiv papers
☆289Updated last month
yangzhch6 / ReSocratic
OptiBench and ReSocratic Synthesis Method
☆28Updated 2 months ago
waterhorse1 / Natural-language-RL
Natural Language Reinforcement Learning
☆100Updated 4 months ago
Vance0124 / Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆148Updated 9 months ago
swtheing / PF-PPO-RLHF
☆34Updated last year
gregorbachmann / Next-Token-Failures
☆106Updated last year
zhaoxlpku / SubgoalXL
☆26Updated last year
cmu-mind / RISE
☆33Updated last year
liziniu / ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
☆199Updated last year
yihedeng9 / rlhf-summary-notes
A brief and partial summary of RLHF algorithms.
☆139Updated 9 months ago