LeslieTrue/SFTvsRL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LeslieTrue/SFTvsRL)

LeslieTrue / SFTvsRL

Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

☆330

Alternatives and similar repositories for SFTvsRL

Users that are interested in SFTvsRL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

RL4VLM / RL4VLM
View on GitHub
Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
☆415Dec 15, 2024Updated last year
EvolvingLMMs-Lab / open-r1-multimodal
View on GitHub
A fork to add multimodal model training to open-r1
☆1,594Feb 8, 2025Updated last year
StarsfieldAI / R1-V
View on GitHub
Witness the aha moment of VLM with less than $3.
☆4,064May 19, 2025Updated last year
NVlabs / Long-RL
View on GitHub
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆727Sep 24, 2025Updated 10 months ago
SalesforceAIResearch / PretrainRL-pipeline
View on GitHub
An automated data pipeline scaling RL to pretraining levels
☆76Jun 2, 2026Updated last month
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
hkust-nlp / simpleRL-reason
View on GitHub
Simple RL training for reasoning
☆3,871Dec 23, 2025Updated 7 months ago
GAIR-NLP / LIMO
View on GitHub
[COLM 2025] LIMO: Less is More for Reasoning
☆1,080Jul 30, 2025Updated 11 months ago
zhouyiks / CoLVA
View on GitHub
☆44Jul 9, 2025Updated last year
sail-sg / understand-r1-zero
View on GitHub
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,269Aug 27, 2025Updated 11 months ago
kxfan2002 / SophiaVL-R1
View on GitHub
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
☆94Aug 8, 2025Updated 11 months ago
facebookresearch / sweet_rl
View on GitHub
Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks
☆271May 5, 2025Updated last year
Open-Reasoner-Zero / Open-Reasoner-Zero
View on GitHub
Official Repo for Open-Reasoner-Zero
☆2,096Jun 2, 2025Updated last year
vsitzmann / xfactor-nvs
View on GitHub
Public code for XFactor: Introduces the first geometry-free model to achieve true self-supervised / pose-free Novel View Synthesis (NVS) …
☆160May 11, 2026Updated 2 months ago
UCSC-VLAA / VLAA-Thinking
View on GitHub
[TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
☆148Oct 10, 2025Updated 9 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
simplescaling / s1
View on GitHub
s1: Simple test-time scaling
☆6,663Jun 25, 2025Updated last year
RLHF-V / RLHF-V
View on GitHub
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
☆310Sep 11, 2024Updated last year
hiyouga / EasyR1
View on GitHub
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆5,085Updated this week
multimodal-reasoning-lab / Bagel-Zebra-CoT
View on GitHub
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
☆137Jan 30, 2026Updated 5 months ago
Dwawayu / Pensieve
View on GitHub
The official implementation for "Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos".
☆49May 23, 2025Updated last year
TideDra / lmm-r1
View on GitHub
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
☆848May 14, 2025Updated last year
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,011Nov 7, 2025Updated 8 months ago
om-ai-lab / VLM-R1
View on GitHub
Solve Visual Understanding with Reinforced VLMs
☆6,018Jul 7, 2026Updated 3 weeks ago
McGill-NLP / VinePPO
View on GitHub
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆192May 25, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
intuitive-robots / NILS
View on GitHub
[CoRL 2024] Official code for "Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models"
☆34Dec 11, 2024Updated last year
vision-x-nyu / thinking-in-space
View on GitHub
Official repo and evaluation implementation of VSI-Bench
☆734Aug 5, 2025Updated 11 months ago
BytedTsinghua-SIA / DAPO
View on GitHub
An Open-source RL System from ByteDance Seed and Tsinghua AIR
☆1,849May 11, 2025Updated last year
YuxiXie / V-DPO
View on GitHub
Preference Learning for LLaVA
☆60Nov 9, 2024Updated last year
tilde-research / sieve
View on GitHub
Applying SAEs for fine-grained control
☆27Dec 15, 2024Updated last year
ai-wand / concise-reasoning
View on GitHub
Concise Reasoning via Reinforcement Learning
☆13Apr 16, 2025Updated last year
LeapLabTHU / limit-of-RLVR
View on GitHub
repo for paper https://arxiv.org/abs/2504.13837
☆346Dec 17, 2025Updated 7 months ago
yuecao0119 / MMInstruct
View on GitHub
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆64Nov 7, 2024Updated last year
chentong0 / rl-binary-rar
View on GitHub
Official repo for "Binary Retrieval-augmented Reward Mitigates Hallucinations"
☆15Nov 13, 2025Updated 8 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
View on GitHub
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,438May 11, 2026Updated 2 months ago
uclanlp / OpenVLThinker
View on GitHub
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆155May 25, 2026Updated 2 months ago
showlab / Show-o
View on GitHub
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
☆1,965Jan 8, 2026Updated 6 months ago
BaohaoLiao / SAGE
View on GitHub
Self-Hinting Language Models Enhance Reinforcement Learning
☆27Mar 28, 2026Updated 4 months ago
mll-lab-nu / RAGEN
View on GitHub
RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.
☆2,756Updated this week
TIGER-AI-Lab / AceCoder
View on GitHub
The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]
☆100Apr 9, 2025Updated last year
Open-Source-O1 / o1_Reasoning_Patterns_Study
View on GitHub
☆105Dec 6, 2024Updated last year