fengzi258 / Ocean-R1Links
☆28Updated 5 months ago
Alternatives and similar repositories for Ocean-R1
Users that are interested in Ocean-R1 are comparing it to the libraries listed below
Sorting:
- a-m-team's exploration in large language modeling☆182Updated 2 months ago
- LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment☆361Updated last year
- ☆183Updated last year
- The related works and background techniques about Openai o1☆224Updated 7 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆287Updated 11 months ago
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆139Updated 4 months ago
- ☆47Updated 6 months ago
- Paper collections of multi-modal LLM for Math/STEM/Code.☆118Updated 2 weeks ago
- Fantastic Data Engineering for Large Language Models☆89Updated 7 months ago
- ☆198Updated 3 months ago
- Our code for ICLR'25 paper "DataMan: Data Manager for Pre-training Large Language Models".☆88Updated this week
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]☆355Updated last week
- A live reading list for LLM-synthetic-data.☆347Updated this week
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆267Updated last year
- Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models☆558Updated last month
- MMR1: Advancing the Frontiers of Multimodal Reasoning☆162Updated 4 months ago
- ☆719Updated last month
- Latest Advances on Reasoning of Multimodal Large Language Models (Multimodal R1 \ Visual R1) ) 🍓☆34Updated 4 months ago
- 在verl上做reward的定制开发☆94Updated 2 months ago
- MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka☆313Updated last month
- ☆145Updated last year
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆147Updated 7 months ago
- The official repo of INF-34B models trained by INF Technology.☆35Updated last year
- [ACL 2025, Main Conference, Oral] Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process☆30Updated last year
- Code for a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models☆65Updated 5 months ago
- ☆83Updated last year
- [SIGIR'24] The official implementation code of MOELoRA.☆175Updated last year
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆376Updated 6 months ago
- ☆206Updated 9 months ago
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆382Updated last month