PKU-YuanGroup / LLaVA-o1Links

☆55

Alternatives and similar repositories for LLaVA-o1

Users that are interested in LLaVA-o1 are comparing it to the libraries listed below

Sorting:

neulab / MultiUI
Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding
☆52Updated 10 months ago
yihedeng9 / OpenVLThinker
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆116Updated 3 months ago
SHI-Labs / VisPer-LM
[NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation, arXiv 2024
☆64Updated 2 weeks ago
MetabrainAGI / Awaker2.5-VL
☆35Updated 9 months ago
yannqi / R-4B
The official repository of "R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Integration"
☆119Updated last month
callsys / GMPO
Geometric-Mean Policy Optimization
☆88Updated 2 weeks ago
MetaStone-AI / MetaStone-S1
The open-source code of MetaStone-S1.
☆107Updated 3 months ago
si0wang / ThinkLite-VL
☆101Updated 4 months ago
XiaoduoAILab / XmodelVLM
☆69Updated last year
GAIR-NLP / PC-Agent-E
Efficient Agent Training for Computer Use
☆132Updated last month
du-nlp-lab / MLR-Copilot
☆67Updated 7 months ago
Tencent / llm.hunyuan.T1
☆84Updated 6 months ago
vis-nlp / ChartGemma
☆68Updated last year
neulab / Pangea
This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"
☆113Updated 4 months ago
metal-chart-generation / metal
☆40Updated 5 months ago
MBZUAI-LLM / web2code
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
☆92Updated last year
FreedomIntelligence / LongLLaVA
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆211Updated 9 months ago
Tencent / Hunyuan-TurboS
☆87Updated 5 months ago
StigLidu / DualDistill
[EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"
☆101Updated 2 months ago
SalesforceAIResearch / LATTE
☆68Updated last month
xlang-ai / OSWorld-G
[NeurIPS 2025 Spotlight] Scaling Computer-Use Grounding via UI Decomposition and Synthesis
☆122Updated 2 weeks ago
aakaran / reasoning-with-sampling
☆215Updated 2 weeks ago
FudanNLPLAB / MouSi
☆74Updated last year
agents-x-project / PyVision
Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."
☆131Updated 3 months ago
dinobby / MAgICoRE
☆24Updated last year
JieyuZ2 / TaskMeAnything
[NeurIPS 2024] A task generation and model evaluation system for multimodal language models.
☆73Updated 11 months ago
huggingface / large-scale-image-deduplication
☆163Updated 3 months ago
fangyuan-ksgk / Tiny-GRPO
minimal GRPO implementation from scratch
☆98Updated 7 months ago
allenai / pixmo-docs
ACL 2025: Synthetic data generation pipelines for text-rich images.
☆143Updated 8 months ago
DAMO-NLP-SG / multimodal_textbook
[ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"
☆174Updated 7 months ago