TencentARC / TokLIPLinks
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
☆206Updated last month
Alternatives and similar repositories for TokLIP
Users that are interested in TokLIP are comparing it to the libraries listed below
Sorting:
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆146Updated last month
- ICML2025☆57Updated 3 weeks ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆162Updated 3 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆127Updated 3 months ago
- Structured Video Comprehension of Real-World Shorts☆193Updated this week
- Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆89Updated last week
- [CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inpu…☆92Updated 4 months ago
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆33Updated 3 weeks ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆85Updated last month
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆56Updated 2 months ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆114Updated last week
- Empowering Unified MLLM with Multi-granular Visual Generation☆130Updated 8 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆116Updated 5 months ago
- ☆154Updated 2 months ago
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆115Updated last month
- GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning☆96Updated 3 months ago
- The code repository of UniRL☆40Updated 3 months ago
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆379Updated last month
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆103Updated 3 weeks ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆117Updated 3 weeks ago
- VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning☆184Updated last month
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆73Updated 2 months ago
- ☆126Updated 3 months ago
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆209Updated 5 months ago
- VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆56Updated 3 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆288Updated last week
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆41Updated 5 months ago
- ☆58Updated 4 months ago
- [ICCV25] USP: Unified Self-Supervised Pretraining for Image Generation and Understanding☆89Updated 2 months ago
- Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unifie…☆206Updated this week