KlingTeam / MODALinks

[ICML 2025 Spotlight] MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding

☆66

Alternatives and similar repositories for MODA

Users that are interested in MODA are comparing it to the libraries listed below

Sorting:

Hoar012 / RAP-MLLM
[CVPR 2025] RAP: Retrieval-Augmented Personalization
☆79Updated 2 months ago
aspirinone / CATR.github.io
☆32Updated last year
TencentARC / ARC-Hunyuan-Video-7B
Structured Video Comprehension of Real-World Shorts
☆230Updated 4 months ago
Go2Heart / StreamFormer
[ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.
☆80Updated last month
JingyuanYY / EmoGen
This is the official implementation of 2024 CVPR paper "EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models".
☆92Updated 3 months ago
contrastive / FreeVideoLLM
☆83Updated last year
TencentARC / TokLIP
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
☆236Updated 5 months ago
Osilly / Interleaving-Reasoning-Generation
[ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA bench…
☆86Updated 2 weeks ago
zhuangshaobin / WeTok
WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
☆57Updated 5 months ago
PKU-YuanGroup / LLMBind
LLMBind: A Unified Modality-Task Integration Framework
☆19Updated last year
YU-deep / VisMem
☆66Updated this week
hshjerry / VideoEspresso
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆134Updated 6 months ago
360CVGroup / PlanGen
Unified layout planning and image generation, ICCV2025
☆40Updated 3 weeks ago
knightyxp / DGL
[AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.
☆47Updated last year
showlab / VideoLISA
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
☆145Updated last year
aniki-ly / FreeLong
[NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…
☆64Updated 7 months ago
inclusionAI / Ming-UniVision
Code release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer
☆136Updated 3 months ago
MCG-NJU / VideoChat-Online
[CVPR 2025] Online Video Understanding: OVBench and VideoChat-Online
☆88Updated 4 months ago
PKU-YuanGroup / UAE
Official repository for the UAE paper, unified-GRPO, and unified-Bench
☆156Updated 4 months ago
Kwai-Klear / AR-GRPO
Training Autoregressive Image Generation models via Reinforcement Learning
☆50Updated 2 months ago
FAVOR-Bench / FAVOR-Bench
☆25Updated 2 months ago
wusize / OpenUni
☆176Updated 7 months ago
wusize / Harmon
[ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
☆186Updated 8 months ago
zai-org / MotionBench
Official code for MotionBench (CVPR 2025)
☆63Updated 11 months ago
z-x-yang / DoraemonGPT
Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models
☆88Updated last year
TencentARC / MindOmni
☆141Updated 3 months ago
nku-zhichengzhang / MART
[CVPR 2024] This is the official implementation of "MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Disti…
☆21Updated 7 months ago
gyxxyg / TRACE
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆143Updated 5 months ago
NJU-PCALab / InstanceCap
[CVPR 2025] InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption 🔍
☆46Updated 7 months ago
PhoenixZ810 / RISEBench
[NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
☆140Updated this week