xmu-xiaoma666 / Multimodal-Open-O1Links

Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool works locally and aims to create inference chains akin to those used by OpenAI-o1, but with localized processing power.

☆29

Alternatives and similar repositories for Multimodal-Open-O1

Users that are interested in Multimodal-Open-O1 are comparing it to the libraries listed below

Sorting:

ZhangXJ199 / TinyLLaVA-Video
A Simple Framework of Small-scale LMMs for Video Understanding
☆73Updated last month
cnzzx / VSA
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
☆125Updated 8 months ago
ggg0919 / cantor
☆86Updated last year
yangjie-cv / WeThink
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning
☆32Updated last month
Hon-Wong / ByteVideoLLM
[ICCV 2025] Dynamic-VLM
☆23Updated 7 months ago
markywg / transagent
[NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
☆24Updated 9 months ago
ding523 / Curr_REFT
☆67Updated 2 months ago
xjtupanda / Sparrow
Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"
☆49Updated 4 months ago
OpenGVLab / TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆53Updated last week
bigai-nlco / VideoLLaMB
[ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges
☆71Updated 5 months ago
invictus717 / MiCo
[ICCV'25] Explore the Limits of Omni-modal Pretraining at Scale
☆111Updated 11 months ago
deepglint / UniME
[ACM MM25] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"
☆83Updated 3 weeks ago
thunlp / Migician
[ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
☆71Updated 2 months ago
alibaba / conv-llava
☆118Updated last year
eric-ai-lab / GRIT
Official code for paper "GRIT: Teaching MLLMs to Think with Images"
☆114Updated this week
EvolvingLMMs-Lab / MGPO
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
☆43Updated last week
yuyq96 / TextHawk
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆62Updated 9 months ago
om-ai-lab / ZoomEye
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆47Updated 7 months ago
FudanNLPLAB / MouSi
☆73Updated last year
LanceZPF / OpenING
Official Implementation of OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
☆28Updated 3 weeks ago
jihaonew / MM-Instruct
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆35Updated last year
EvolvingLMMs-Lab / VideoMMMU
☆51Updated last month
CuriseJia / ECCV24-FreeStyleRet
Precision Search through Multi-Style Inputs
☆71Updated this week
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆62Updated 6 months ago
TheEighthDay / SeekWorld
The first attempt to replicate o3-like visual clue-tracking reasoning capabilities.
☆58Updated 3 weeks ago
Open-Reasoner-Zero / Open-Vision-Reasoner
The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning".
☆118Updated 3 weeks ago
SparksJoe / Prism
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆44Updated last year
PhoenixZ810 / MG-LLaVA
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).
☆156Updated 10 months ago
sterzhang / image-textualization
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)
☆164Updated last year
inst-it / inst-it
Official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning"
☆35Updated 5 months ago