SxJyJay / UniToken
[CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inputs, making it easy to integrate both visual understanding and image generation tasks seamlessly.
☆61Updated last week
Alternatives and similar repositories for UniToken:
Users that are interested in UniToken are comparing it to the libraries listed below
- EventHallusion: Diagnosing Event Hallucinations in Video LLMs☆30Updated 3 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆79Updated last week
- ☆19Updated 3 months ago
- [NeurIPS 2024] Lumen: a Large multimodal model with versatile vision-centric capabilities☆24Updated 6 months ago
- ☆72Updated 3 weeks ago
- T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation☆73Updated 3 weeks ago
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.☆70Updated 3 weeks ago
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆312Updated last month
- This is the official implementation for ControlVAR.☆102Updated 4 months ago
- ☆22Updated 3 weeks ago
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆32Updated last week
- Exposing Text-Image Inconsistency Using Diffusion Models (ICLR 2024)☆10Updated 10 months ago
- 【CVPR 2025 Oral】Official Repo for Paper "AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea"☆99Updated 2 weeks ago
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆82Updated 6 months ago
- Official code for CVPR 2024 paper: Discriminative Probing and Tuning for Text-to-Image Generation☆31Updated 2 weeks ago
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆89Updated 2 months ago
- ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning☆28Updated 2 weeks ago
- [CVPR 2025 (Oral)] Open implementation of "RandAR"☆107Updated 3 weeks ago
- [CVPR 2024] Official implementation of CVPR 2024 paper: "Doubly Abductive Counterfactual Inference for Text-based Image Editing"☆23Updated last year
- [CVPR 2025] InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption 🔍☆39Updated last week
- This is a repository to collect training-free algorithms for visual generation and manipulation☆28Updated this week
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆35Updated 2 months ago
- PyTorch implementation of InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following☆30Updated 2 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆79Updated last week
- A collection of vision foundation models unifying understanding and generation.☆50Updated 3 months ago
- Official repository for CoMM Dataset☆32Updated 3 months ago
- Implements VAR+CLIP for text-to-image (T2I) generation☆135Updated 2 months ago
- Official Implementation of VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention☆34Updated this week
- Official Implementation of VideoDPO☆84Updated 3 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆171Updated last week