arctanxarc / UniCTokensLinks
A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating the potential of cross-task information transfer in personalized scenario, paving the way for the development of general unified models.
☆121Updated last month
Alternatives and similar repositories for UniCTokens
Users that are interested in UniCTokens are comparing it to the libraries listed below
Sorting:
- Official implementation of MC-LLaVA.☆139Updated 3 weeks ago
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆153Updated 6 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆288Updated this week
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆56Updated 2 months ago
- ☆120Updated 5 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆146Updated last month
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆95Updated last week
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆31Updated 5 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆215Updated last month
- A collection of vision foundation models unifying understanding and generation.☆57Updated 8 months ago
- ☆69Updated this week
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆117Updated 10 months ago
- This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".☆55Updated 2 months ago
- Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unifie…☆206Updated this week
- A tiny paper rating web☆39Updated 6 months ago
- [LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)☆32Updated 4 months ago
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆81Updated 3 months ago
- Survey: https://arxiv.org/pdf/2507.20198☆139Updated last week
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆164Updated 4 months ago
- Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning☆63Updated 4 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆106Updated 4 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆116Updated 5 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆103Updated 3 months ago
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆33Updated 3 weeks ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆103Updated 3 weeks ago
- A paper list for spatial reasoning☆139Updated 3 months ago
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆66Updated last month
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆262Updated 5 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆155Updated last month
- Official repository for VisionZip (CVPR 2025)☆347Updated 2 months ago