zhijie-group / UniCMsLinks

☆39

Alternatives and similar repositories for UniCMs

Users that are interested in UniCMs are comparing it to the libraries listed below

Sorting:

EvolvingLMMs-Lab / MGPO
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
☆52Updated 5 months ago
TencentARC / GRPO-CARE
☆80Updated 6 months ago
Fr0zenCrane / UniCoT
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
☆194Updated 3 weeks ago
RenShuhuai-Andy / NBP
Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
☆40Updated 11 months ago
TencentARC / SEED-Bench-R1
☆96Updated 6 months ago
showlab / UniRL
The code repository of UniRL
☆49Updated 7 months ago
Gen-Verse / HermesFlow
[NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
☆73Updated 3 months ago
TencentARC / Video-Holmes
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆85Updated 6 months ago
Tiezheng11 / Vision-Language-Vision
☆63Updated 6 months ago
Espere-1119-Song / VideoNSA
VideoNSA: Native Sparse Attention Scales Video Understanding
☆78Updated 2 months ago
OpenGVLab / TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆64Updated 5 months ago
multimodal-reasoning-lab / Bagel-Zebra-CoT
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
☆113Updated 2 months ago
TencentARC / MindOmni
☆140Updated 3 months ago
NVlabs / QLIP
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
☆94Updated 10 months ago
Jingfeng0705 / LIFT
The official repo for LIFT: Language-Image Alignment with Fixed Text Encoders
☆42Updated 7 months ago
KaiyueSun98 / T2I-ReasonBench
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
☆35Updated 4 months ago
HL-hanlin / Bifrost-1
Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)
☆44Updated last month
MarkXCloud / CSpD
The official repo of continuous speculative decoding
☆31Updated 9 months ago
markywg / transagent
[NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
☆26Updated last year
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆70Updated 11 months ago
OpenGVLab / PVC
[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
☆51Updated 7 months ago
si0wang / VisVM
☆46Updated last year
sjz5202 / LLaVA-Reward
Official repository for LLaVA-Reward (ICCV 2025): Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
☆22Updated 5 months ago
Yu-xm / Unicorn
Text-Only Data Synthesis for Vision Language Model Training
☆23Updated 7 months ago
mlvlab / DeepVideoR1
[NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"
☆31Updated 2 months ago
mengcye / LAION-SG
☆56Updated 8 months ago
Tencent / HaploVLM
ICML2025
☆63Updated 4 months ago
InternLM / ARC-VL
☆35Updated last month
showlab / D-AR
the official repo for "D-AR: Diffusion via Autoregressive Models"
☆129Updated 6 months ago
inclusionAI / M2-Reasoning
M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning
☆46Updated 6 months ago