LeslieTrue / SFTvsRL
Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
☆269Updated last week
Alternatives and similar repositories for SFTvsRL:
Users that are interested in SFTvsRL are comparing it to the libraries listed below
- Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"☆100Updated 2 weeks ago
- ☆287Updated last month
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆204Updated this week
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆308Updated last week
- Official implementation of the Law of Vision Representation in MLLMs☆154Updated 5 months ago
- EVE Series: Encoder-Free Vision-Language Models from BAAI☆324Updated 2 months ago
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆121Updated last week
- ☆192Updated 2 months ago
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆151Updated 2 weeks ago
- official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”☆52Updated this week
- ☆163Updated last month
- Repo of paper "Free Process Rewards without Process Labels"☆145Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆186Updated 3 weeks ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆195Updated last month
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆65Updated 2 weeks ago
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆422Updated 3 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆215Updated this week
- ☆101Updated 3 weeks ago
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆206Updated 6 months ago
- A brief and partial summary of RLHF algorithms.☆128Updated 2 months ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆120Updated 3 weeks ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆124Updated 10 months ago
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for E…☆422Updated 2 weeks ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆409Updated last year
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated 2 months ago
- ☆111Updated this week
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆222Updated last month
- MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts☆304Updated 5 months ago
- Rethinking Step-by-step Visual Reasoning in LLMs☆292Updated 3 months ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆128Updated 3 months ago