hkproj / dpo-notesLinks

Notes on Direct Preference Optimization

☆21

Alternatives and similar repositories for dpo-notes

Users that are interested in dpo-notes are comparing it to the libraries listed below

Sorting:

fangyuan-ksgk / Tiny-GRPO
minimal GRPO implementation from scratch
☆94Updated 4 months ago
tigerchen52 / awesome_role_of_small_models
a curated list of the role of small models in the LLM era
☆103Updated 10 months ago
SeunghyunSEO / optimized_hf_llama_class_for_training
☆48Updated 11 months ago
SALT-NLP / demonstrated-feedback
☆125Updated 10 months ago
shangshang-wang / Tina
Tina: Tiny Reasoning Models via LoRA
☆274Updated 2 months ago
hkproj / pytorch-transformer-distributed
Distributed training (multi-node) of a Transformer model
☆76Updated last year
tianyi-lab / MoE-Embedding
Code for "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"
☆76Updated 9 months ago
facebookresearch / RAM
A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).
☆263Updated last week
CodeCreator / WebOrganizer
Organize the Web: Constructing Domains Enhances Pre-Training Data Curation
☆58Updated 3 months ago
salesforce / summary-of-a-haystack
Codebase accompanying the Summary of a Haystack paper.
☆79Updated 10 months ago
arpita8 / Awesome-Mixture-of-Experts-Papers
Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.
☆128Updated 11 months ago
trapoom555 / Language-Model-STS-CFT
Improving Text Embedding of Language Models Using Contrastive Fine-tuning
☆64Updated last year
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆68Updated 3 months ago
ytyz1307zzh / RefAug
Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"
☆55Updated 10 months ago
zai-org / ComplexFuncBench
Complex Function Calling Benchmark.
☆123Updated 6 months ago
sanyalsunny111 / LLM-Inheritune
This is the official repository for Inheritune.
☆112Updated 5 months ago
AIMO-CMU-MATH / CMU_MATH-AIMO
☆76Updated last year
efficientscaling / Z1
Repo for "Z1: Efficient Test-time Scaling with Code"
☆63Updated 3 months ago
cmu-l3 / anlp-spring2025-code
Advanced NLP, Spring 2025 https://cmu-l3.github.io/anlp-spring2025/
☆61Updated 4 months ago
QwenLM / WorldPM
☆90Updated 2 months ago
Mohammadjafari80 / GSM8K-RLVR
A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.
☆119Updated 6 months ago
THU-KEG / Agentic-Reward-Modeling
[ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
☆99Updated last month
cambridgeltl / PairS
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)
☆47Updated 6 months ago
Nardien / agent-distillation
Official Code Repository for the paper "Distilling LLM Agent into Small Models with Retrieval and Code Tools"
☆130Updated this week
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆56Updated last week
cmu-l3 / neurips2024-inference-tutorial-code
NeurIPS 2024 tutorial on LLM Inference
☆45Updated 7 months ago
DataArcTech / LLM-as-a-Judge
☆128Updated 4 months ago
hkproj / rlhf-ppo
Notes and commented code for RLHF (PPO)
☆101Updated last year
rsshyam / GRPO
☆64Updated last year
daniel-furman / sft-demos
Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.
☆77Updated 9 months ago