fengzi258 / Ocean-R1Links
☆29Updated 10 months ago
Alternatives and similar repositories for Ocean-R1
Users that are interested in Ocean-R1 are comparing it to the libraries listed below
Sorting:
- Paper collections of multi-modal LLM for Math/STEM/Code.☆135Updated 2 months ago
- a-m-team's exploration in large language modeling☆195Updated 8 months ago
- LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment☆395Updated last year
- ☆48Updated 11 months ago
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆154Updated last month
- [ICLR 2026] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.☆532Updated last month
- ☆196Updated last year
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆410Updated 9 months ago
- Our code for ICLR'25 paper "DataMan: Data Manager for Pre-training Large Language Models".☆114Updated 5 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆306Updated last year
- [ICLR26]GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning☆173Updated last week
- [MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models☆50Updated last year
- MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka☆322Updated 7 months ago
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]☆565Updated last week
- ☆218Updated 2 months ago
- MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources☆214Updated 4 months ago
- [ACL 2025, Main Conference, Oral] Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process☆30Updated last year
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆387Updated 5 months ago
- Extrapolating RLVR to General Domains without Verifiers☆196Updated 5 months ago
- ☆175Updated last year
- The related works and background techniques about Openai o1☆220Updated last year
- ☆25Updated 9 months ago
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆390Updated last year
- Code for a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models☆67Updated 11 months ago
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆177Updated last year
- A RLHF Infrastructure for Vision-Language Models☆195Updated last year
- 在verl上做reward的定制开发☆144Updated 8 months ago
- ☆1,112Updated 2 months ago
- This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vi…☆117Updated 7 months ago
- [ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)☆185Updated 11 months ago