KlingTeam / MODALinks
[ICML 2025 Spotlight] MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding
☆66Updated 6 months ago
Alternatives and similar repositories for MODA
Users that are interested in MODA are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] RAP: Retrieval-Augmented Personalization☆79Updated 2 months ago
- ☆32Updated last year
- Structured Video Comprehension of Real-World Shorts☆230Updated 4 months ago
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆80Updated last month
- This is the official implementation of 2024 CVPR paper "EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models".☆92Updated 3 months ago
- ☆83Updated last year
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆236Updated 5 months ago
- [ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA bench…☆86Updated 2 weeks ago
- WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction☆57Updated 5 months ago
- LLMBind: A Unified Modality-Task Integration Framework☆19Updated last year
- ☆66Updated this week
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆134Updated 6 months ago
- Unified layout planning and image generation, ICCV2025☆40Updated 3 weeks ago
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.☆47Updated last year
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆145Updated last year
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆64Updated 7 months ago
- Code release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer☆136Updated 3 months ago
- [CVPR 2025] Online Video Understanding: OVBench and VideoChat-Online☆88Updated 4 months ago
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆156Updated 4 months ago
- Training Autoregressive Image Generation models via Reinforcement Learning☆50Updated 2 months ago
- ☆25Updated 2 months ago
- ☆176Updated 7 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆186Updated 8 months ago
- Official code for MotionBench (CVPR 2025)☆63Updated 11 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆88Updated last year
- ☆141Updated 3 months ago
- [CVPR 2024] This is the official implementation of "MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Disti…☆21Updated 7 months ago
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆143Updated 5 months ago
- [CVPR 2025] InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption 🔍☆46Updated 7 months ago
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆140Updated this week