multimodal-reasoning-lab / Bagel-Zebra-CoTLinks

https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT

☆103

Alternatives and similar repositories for Bagel-Zebra-CoT

Users that are interested in Bagel-Zebra-CoT are comparing it to the libraries listed below

Sorting:

Fr0zenCrane / UniCoT
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
☆173Updated last week
TencentARC / Video-Holmes
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆78Updated 4 months ago
penghao-wu / visual_jigsaw
☆63Updated last month
TencentARC / SEED-Bench-R1
☆94Updated 5 months ago
thuml / MiniVeo3-Reasoner
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…
☆185Updated last month
showlab / UniRL
The code repository of UniRL
☆46Updated 6 months ago
egolife-ai / Ego-R1
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆131Updated 3 months ago
NVlabs / QLIP
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
☆94Updated 9 months ago
UMass-Embodied-AGI / Mirage
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)
☆197Updated 4 months ago
TencentARC / GRPO-CARE
☆79Updated 5 months ago
TencentARC / MindOmni
☆135Updated last month
wusize / Harmon
[ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
☆177Updated 6 months ago
AntResearchNLP / ViLaSR
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
☆79Updated 4 months ago
Haochen-Wang409 / ross
[ICLR'25] Reconstructive Visual Instruction Tuning
☆128Updated 7 months ago
Cooperx521 / ScaleCap
Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’
☆58Updated 5 months ago
gogoduan / GoT-R1
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning
☆100Updated 6 months ago
kxfan2002 / SophiaVL-R1
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
☆87Updated 3 months ago
rese1f / aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
☆133Updated 6 months ago
PKU-YuanGroup / UAE
Official repository for the UAE paper, unified-GRPO, and unified-Bench
☆148Updated 2 months ago
Franklin-Zhang0 / ReasonGen-R1
Official respository for ReasonGen-R1
☆73Updated 5 months ago
rongyaofang / PUMA
Empowering Unified MLLM with Multi-granular Visual Generation
☆131Updated 10 months ago
aim-uofa / dLLM-MidTruth
☆55Updated 3 months ago
RenShuhuai-Andy / NBP
Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
☆40Updated 9 months ago
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆90Updated last year
PhoenixZ810 / RISEBench
[NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
☆120Updated last week
LanceZPF / OpenING
Official Implementation of OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
☆36Updated 4 months ago
Tiezheng11 / Vision-Language-Vision
☆63Updated 4 months ago
Tencent / HaploVLM
ICML2025
☆61Updated 3 months ago
Dongping-Chen / ISG
(ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.
☆31Updated 3 months ago
Wakals / CoVT
☆52Updated last week