Tencent/HaploVLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Tencent/HaploVLM)

Tencent / HaploVLM

ICML2025

☆63

Alternatives and similar repositories for HaploVLM

Users that are interested in HaploVLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TencentARC / MindOmni
View on GitHub
[NeurIPS2025] The official implementation of MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
☆139Oct 15, 2025Updated 9 months ago
lyk412 / Consistent123
View on GitHub
[ACMMM 2024] Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors
☆25Oct 22, 2024Updated last year
SAIS-FUXI / Omni-Video
View on GitHub
☆156Feb 28, 2026Updated 4 months ago
huangrh99 / AlphaGRPO
View on GitHub
[ICML2026] Official Implementation of AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in Unified Multimodal Models via Decompo…
☆73Jul 14, 2026Updated last week
TencentARC / Divot
View on GitHub
Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)
☆87Feb 27, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
TencentARC / ARC-Hunyuan-Video-7B
View on GitHub
Structured Video Comprehension of Real-World Shorts
☆238Sep 21, 2025Updated 10 months ago
ByteDance-Seed / SAIL
View on GitHub
Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"
☆85Oct 29, 2025Updated 8 months ago
Junchao-cs / LIVE
View on GitHub
[ICML 2026] "LIVE: Long-horizon Interactive Video World ModEling"
☆35Updated this week
VincentHancoder / ViGoR-Bench-Eval
View on GitHub
☆34Apr 5, 2026Updated 3 months ago
RobertLuo1 / NeurIPS2023_SOC
View on GitHub
[NeurIPS 2023] The official implementation of SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
☆33Mar 16, 2024Updated 2 years ago
TencentARC / ARC-Chapter
View on GitHub
Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
☆44Nov 19, 2025Updated 8 months ago
zhouyiks / CoLVA
View on GitHub
☆44Jul 9, 2025Updated last year
TencentARC / TokLIP
View on GitHub
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
☆236Aug 18, 2025Updated 11 months ago
UCSC-VLAA / Complex-Edit
View on GitHub
Complex-Edit: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark
☆29Apr 22, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
csuhan / Tar
View on GitHub
[NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
☆202Sep 18, 2025Updated 10 months ago
mashijie1028 / GenHancer
View on GitHub
(ICCV 2025) Enhance CLIP and MLLM's fine-grained visual representations with generative models.
☆78Jun 25, 2025Updated last year
Yangr116 / VST
View on GitHub
[ECCV2026] Visual Spatial Tuning
☆199Mar 25, 2026Updated 3 months ago
hzphzp / WeGen
View on GitHub
☆27Apr 25, 2025Updated last year
apple / ml-unigen
View on GitHub
UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation
☆43Nov 24, 2025Updated 7 months ago
wren93 / tuna
View on GitHub
☆94Apr 29, 2026Updated 2 months ago
TencentARC / Video-Holmes
View on GitHub
[ECCV 2026] Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆95Jul 13, 2025Updated last year
JaydenLyh / Reward-Forcing
View on GitHub
[CVPR 2026 Highlight] Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
☆352Dec 15, 2025Updated 7 months ago
Hannieliao / Baton
View on GitHub
Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"
☆32Mar 4, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
KaiyueSun98 / T2I-Personalization-with-AR
View on GitHub
☆47Apr 20, 2025Updated last year
Lyne1 / Realgeneral
View on GitHub
RealGeneral (ICCV2025)
☆17Jul 16, 2025Updated last year
lyrig / TokenAR
View on GitHub
TokenAR: Multiple Subject Generation via Autoregressive Token-level enhancement
☆22Mar 4, 2026Updated 4 months ago
jylins / hourllava
View on GitHub
[NeurIPS 2025 Spotlight] Unleashing Hour-Scale Video Training for Long Video-Language Understanding
☆19Jun 24, 2025Updated last year
EasonXiao-888 / SpatialEdit
View on GitHub
[Official Repo] SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing
☆214Apr 13, 2026Updated 3 months ago
inclusionAI / Ming-UniVision
View on GitHub
Code release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer
☆143Oct 14, 2025Updated 9 months ago
longmalongma / TW-GRPO
View on GitHub
The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"
☆36Jun 12, 2025Updated last year
VincentHancoder / AToM
View on GitHub
The official implementation of work "AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward".
☆19Mar 25, 2025Updated last year
rongyaofang / GoT
View on GitHub
Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"
☆317Sep 28, 2025Updated 9 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
causalfusion / causalfusion
View on GitHub
☆196Dec 17, 2024Updated last year
onecat-ai / OneCAT
View on GitHub
OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation
☆261Sep 22, 2025Updated 9 months ago
ant-research / lumos
View on GitHub
[CVPR'25 - Rating 555] Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text
☆52Mar 16, 2025Updated last year
tliby / UniFork
View on GitHub
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
☆48Aug 26, 2025Updated 10 months ago
EasonXiao-888 / UVCOM
View on GitHub
[CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
☆117Jul 17, 2024Updated 2 years ago
huawei-lin / VTBench
View on GitHub
This repository provides the official implementation of VTBench, a benchmark designed to evaluate the performance of visual tokenizers (V…
☆35Jul 30, 2025Updated 11 months ago
zhuangshaobin / Video-GPT
View on GitHub
[ICLR2026] Video-GPT via Next Clip Diffusion.
☆46Jun 2, 2025Updated last year