arctanxarc / UniCTokensLinks
A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating the potential of cross-task information transfer in personalized scenario, paving the way for the development of general unified models.
β123Updated last month
Alternatives and similar repositories for UniCTokens
Users that are interested in UniCTokens are comparing it to the libraries listed below
Sorting:
- Official implementation of MC-LLaVA.β139Updated last week
- π This is a repository for organizing papers, codes, and other resources related to unified multimodal models.β328Updated last month
- π₯CVPR 2025 Multimodal Large Language Models Paper Listβ155Updated 8 months ago
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editingβ114Updated last month
- This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".β69Updated 4 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generationβ231Updated 3 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generationβ161Updated 2 weeks ago
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasingββ57Updated 4 months ago
- β126Updated 8 months ago
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"β64Updated last month
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawingβ79Updated 3 months ago
- A collection of vision foundation models unifying understanding and generation.β58Updated 10 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generationβ177Updated 6 months ago
- UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generationβ111Updated this week
- Official repository for the UAE paper, unified-GRPO, and unified-Benchβ147Updated 2 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understandingβ132Updated 2 months ago
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"β71Updated this week
- Offline implementation of UniREditBench: A Unified Reasoning-based Image Editing Benchmark.β40Updated this week
- π₯π₯π₯ Latest Papers, Codes and Datasets on Video-LMM Post-Trainingβ169Updated 3 weeks ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generationβ134Updated last year
- [ICLR'25] Reconstructive Visual Instruction Tuningβ125Updated 7 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoningβ97Updated 4 months ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?β103Updated 3 months ago
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understandingβ33Updated 7 months ago
- [NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuningβ73Updated 2 months ago
- [LLaVA-Video-R1]β¨First Adaptation of R1 to LLaVA-Video (2025-03-18)β35Updated 6 months ago
- Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unifieβ¦β310Updated last month
- Survey: https://arxiv.org/pdf/2507.20198β203Updated 3 weeks ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025β85Updated 8 months ago
- Collections of Papers and Projects for Multimodal Reasoning.β105Updated 6 months ago