KwaiVGI / MODALinks
[ICML 2025 Spotlight] MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding
☆46Updated 2 weeks ago
Alternatives and similar repositories for MODA
Users that are interested in MODA are comparing it to the libraries listed below
Sorting:
- ☆136Updated 3 weeks ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆143Updated 2 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆116Updated last month
- Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆78Updated this week
- [CVPR 2025] InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption 🔍☆44Updated 2 weeks ago
- Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"☆232Updated 3 months ago
- ☆34Updated last month
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆61Updated 4 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆123Updated 6 months ago
- Official code for MotionBench (CVPR 2025)☆50Updated 4 months ago
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.☆79Updated 2 months ago
- Frequency Autoregressive Image Generation with Continuous Tokens☆79Updated last month
- GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning☆90Updated last month
- ☆31Updated last year
- official code repo of CVPR 2025 paper PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation☆38Updated 4 months ago
- USP: Unified Self-Supervised Pretraining for Image Generation and Understanding☆77Updated 3 weeks ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆54Updated 3 weeks ago
- Implements VAR+CLIP for text-to-image (T2I) generation☆143Updated 6 months ago
- [CVPR 2025 (Oral)] Open implementation of "RandAR"☆178Updated last week
- Official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning"☆34Updated 5 months ago
- ☆91Updated last month
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?