EasonXiao-888 / MindOmniLinks

☆21

Alternatives and similar repositories for MindOmni

Users that are interested in MindOmni are comparing it to the libraries listed below

Sorting:

TencentARC / Video-Holmes
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆49Updated this week
Yu-xm / Unicorn
Text-Only Data Synthesis for Vision Language Model Training
☆18Updated 3 weeks ago
showlab / FQGAN
FQGAN: Factorized Visual Tokenization and Generation
☆50Updated 2 months ago
showlab / UniRL
The code repository of UniRL
☆20Updated last week
aniki-ly / FreeLong
[NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…
☆44Updated 3 months ago
TencentARC / TokLIP
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
☆23Updated 3 weeks ago
Yxxxb / LAVT-RS
[CVPR'2022, TPAMI'2024] LAVT: Language-Aware Vision Transformer for Referring Segmentation
☆20Updated 4 months ago
THUDM / MotionBench
Official code for MotionBench (CVPR 2025)
☆40Updated 3 months ago
SuleBai / SC-CLIP
Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
☆48Updated 2 weeks ago
AndyTang15 / FLAG3Dv2
☆21Updated last year
jialuli-luka / Video-MSG
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
☆21Updated last month
RenShuhuai-Andy / NBP
Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
☆31Updated 3 months ago
TencentARC / SEED-Bench-R1
☆81Updated 2 months ago
JoeLeelyf / OVO-Bench
[CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
☆59Updated 2 months ago
jiyt17 / Prompt-A-Video
☆13Updated 4 months ago
gogoduan / GoT-R1
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning
☆70Updated last week
SilentView / LVD-2M
[NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"
☆59Updated 7 months ago
shiyi-zh0408 / NAE_CVPR2024
Accepted by CVPR 2024
☆33Updated last year
xizaoqu / MOFT
[Neurips 2024] Video Diffusion Models are Training-free Motion Interpreter and Controller
☆42Updated last month
3DTopia / GenDoP
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography
☆62Updated this week
KaiyueSun98 / T2I-Personalization-with-AR
☆43Updated last month
DAMO-NLP-SG / DiGIT
[NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
☆69Updated 7 months ago
hmxiong / StreamChat
Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025
☆53Updated 2 months ago
yaolinli / TimeChat-Online
TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
☆42Updated 2 weeks ago
PhoenixZ810 / RISEBench
Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
☆53Updated last week
sming256 / BOLT
[CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
☆20Updated 2 months ago
Jialuo-Li / Science-T2I
[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis
☆55Updated last month
PKU-YuanGroup / WISE
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
☆105Updated this week
hu-zijing / B2-DiffuRL
[CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.
☆29Updated 2 months ago
zhangguiwei610 / V2Flow
☆23Updated 2 months ago