KlingTeam / MODALinks
[ICML 2025 Spotlight] MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding
☆66Updated 6 months ago
Alternatives and similar repositories for MODA
Users that are interested in MODA are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] RAP: Retrieval-Augmented Personalization☆79Updated 2 months ago
- ☆32Updated last year
- ☆83Updated last year
- Structured Video Comprehension of Real-World Shorts☆230Updated 4 months ago
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆80Updated last month
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆134Updated 6 months ago
- This is the official implementation of 2024 CVPR paper "EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models".☆92Updated 3 months ago
- WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction☆57Updated 5 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆236Updated 5 months ago
- ☆141Updated 3 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆138Updated 8 months ago
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆156Updated 4 months ago
- Video Generation Benchmark☆68Updated 7 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆186Updated 8 months ago
- [ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA bench…☆86Updated last week
- ☆64Updated 2 months ago
- [CVPR 2025] Online Video Understanding: OVBench and VideoChat-Online☆85Updated 3 months ago
- ☆54Updated last year
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆143Updated 5 months ago
- [NeurIPS'25 Spotlight] MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation☆19Updated 11 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆145Updated last year
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆64Updated 7 months ago
- Unified layout planning and image generation, ICCV2025☆40Updated 2 weeks ago
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.☆47Updated last year
- [CVPR 2025] InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption 🔍☆46Updated 7 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆88Updated last year
- Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"☆173Updated 3 weeks ago
- (NeurIPS 2025) Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation☆62Updated 3 months ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆79Updated last month
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆140Updated this week