fengzi258 / Ocean-R1Links
☆28Updated 9 months ago
Alternatives and similar repositories for Ocean-R1
Users that are interested in Ocean-R1 are comparing it to the libraries listed below
Sorting:
- a-m-team's exploration in large language modeling☆195Updated 7 months ago
- [Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.☆520Updated last week
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆154Updated 3 months ago
- The related works and background techniques about Openai o1☆221Updated last year
- Paper collections of multi-modal LLM for Math/STEM/Code.☆133Updated last month
- ☆25Updated 9 months ago
- MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources☆211Updated 3 months ago
- [ACL 2025, Main Conference, Oral] Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process☆30Updated last year
- LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment☆391Updated last year
- The official repo of INF-34B models trained by INF Technology.☆34Updated last year
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆410Updated 8 months ago
- ☆192Updated last year
- Our code for ICLR'25 paper "DataMan: Data Manager for Pre-training Large Language Models".☆110Updated 4 months ago
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆389Updated 11 months ago
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]☆539Updated 3 weeks ago
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆375Updated 4 months ago
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆175Updated last year
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.☆163Updated 3 months ago
- 在verl上做reward的定制开发☆140Updated 7 months ago
- ☆47Updated 11 months ago
- GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning☆169Updated 3 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆301Updated last year
- Extrapolating RLVR to General Domains without Verifiers☆187Updated 4 months ago
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆284Updated 2 years ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆279Updated last year
- MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka☆322Updated 6 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆184Updated 6 months ago
- Code for a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models☆66Updated 10 months ago
- [MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models☆49Updated last year
- A live reading list for LLM data synthesis (Updated to July, 2025).☆435Updated 4 months ago