LeslieTrue / SFTvsRL
Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
☆255Updated last month
Alternatives and similar repositories for SFTvsRL:
Users that are interested in SFTvsRL are comparing it to the libraries listed below
- ☆262Updated 2 weeks ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆162Updated last week
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆209Updated 3 weeks ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆195Updated last week
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆249Updated 2 months ago
- Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models☆439Updated this week
- Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long c…☆218Updated this week
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆148Updated 2 weeks ago
- A brief and partial summary of RLHF algorithms.☆127Updated 3 weeks ago
- ☆188Updated last month
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆200Updated 2 months ago
- A Survey on Efficient Reasoning for LLMs☆204Updated this week
- ☆84Updated last month
- Rethinking Step-by-step Visual Reasoning in LLMs☆282Updated 2 months ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆124Updated 2 months ago
- ☆171Updated last month
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆209Updated last week
- Repo of paper "Free Process Rewards without Process Labels"☆138Updated 2 weeks ago
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆123Updated 11 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆140Updated this week
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆161Updated 3 months ago
- ☆129Updated this week
- 🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.☆300Updated last week
- EVE Series: Encoder-Free Vision-Language Models from BAAI☆314Updated last month
- Official implementation of the Law of Vision Representation in MLLMs☆151Updated 4 months ago
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆183Updated 8 months ago
- Explore the Multimodal “Aha Moment” on 2B Model☆538Updated last week
- An open source implementation of CLIP (With TULIP Support)☆113Updated last week
- [CVPR2025] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆172Updated last week
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"☆138Updated 3 weeks ago