LMM101 / Awesome-Multimodal-Next-Token-PredictionLinks

[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

☆451

Alternatives and similar repositories for Awesome-Multimodal-Next-Token-Prediction

Users that are interested in Awesome-Multimodal-Next-Token-Prediction are comparing it to the libraries listed below

Sorting:

showlab / Awesome-Unified-Multimodal-Models
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
☆725Updated 2 weeks ago
CodeGoat24 / UnifiedReward
Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think
☆574Updated this week
NVlabs / Long-RL
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆644Updated last month
Visual-Agent / DeepEyes
☆883Updated this week
turningpoint-ai / VisualThinker-R1-Zero
Explore the Multimodal “Aha Moment” on 2B Model
☆613Updated 7 months ago
tulerfeng / Video-R1
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆718Updated last month
AIDC-AI / Awesome-Unified-Multimodal-Models
Awesome Unified Multimodal Models
☆805Updated 2 months ago
mit-han-lab / vila-u
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
☆398Updated 6 months ago
ByteVisionLab / TokenFlow
[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
☆393Updated 2 months ago
yaotingwangofficial / Awesome-MCoT
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
☆849Updated 2 months ago
Wang-Xiaodong1899 / Open-R1-Video
✨First Open-Source R1-like Video-LLM [2025/02/18]
☆369Updated 8 months ago
Osilly / Vision-R1
This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages …
☆717Updated last month
Purshow / Awesome-Unified-Multimodal
📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.
☆319Updated last week
FanqingM / MM-Eureka-V0
MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka
☆320Updated 4 months ago
ModalMinds / MM-EUREKA
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
☆751Updated last month
swordlidev / Efficient-Multimodal-LLMs-Survey
Efficient Multimodal Large Language Models: A Survey
☆373Updated 5 months ago
YingqingHe / Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
☆509Updated 6 months ago
Victorwz / Open-Qwen2VL
[COLM 2025] Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources
☆274Updated 2 months ago
baaivision / EVE
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆353Updated 3 months ago
yongliang-wu / DFT
[Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
☆471Updated this week
TideDra / lmm-r1
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
☆825Updated 5 months ago
Fancy-MLLM / R1-Onevision
R1-onevision, a visual language model capable of deep CoT reasoning.
☆569Updated 6 months ago
Kwai-YuanQi / MM-RLHF
The Next Step Forward in Multimodal LLM Alignment
☆183Updated 5 months ago
yfzhang114 / Awesome-Multimodal-Large-Language-Models
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
☆683Updated last month
FoundationVision / UniTok
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
☆425Updated last month
zhaochen0110 / Awesome_Think_With_Images
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,033Updated 3 weeks ago
deepcs233 / Visual-CoT
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆387Updated 10 months ago
RunpeiDong / DreamLLM
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆459Updated 10 months ago
ML-GSAI / LLaDA-V
☆254Updated last week
Mini-o3 / Mini-o3
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆342Updated last month