AMAP-ML / USP
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
☆62Updated 2 weeks ago
Alternatives and similar repositories for USP:
Users that are interested in USP are comparing it to the libraries listed below
- Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model.☆42Updated last month
- VMBench: A Benchmark for Perception-Aligned Video Motion Generation☆45Updated last month
- GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning☆119Updated this week
- Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"☆197Updated 2 weeks ago
- [CVPR 2025] InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption 🔍☆40Updated last month
- ☆79Updated last month
- ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning☆31Updated last month
- [CVPR 2025 (Oral)] Open implementation of "RandAR"☆129Updated last month
- Code and Data for "GenAI Arena: An Open Evaluation Platform for Generative Models" [NeurIPS 2024]☆19Updated 8 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆81Updated last month
- Empowering Unified MLLM with Multi-granular Visual Generation☆119Updated 3 months ago
- Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆44Updated last month
- ☆25Updated last month
- Official implementation of Unified Reward Model for Multimodal Understanding and Generation.☆243Updated this week
- Unifying Visual Understanding and Generation with Dual Visual Vocabularies 🌈☆43Updated 3 weeks ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆44Updated 2 months ago
- [CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆98Updated last month
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆101Updated 6 months ago
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.☆72Updated this week
- ☆17Updated last month
- p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay☆35Updated 4 months ago
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆318Updated 2 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆81Updated last month
- FQGAN: Factorized Visual Tokenization and Generation☆50Updated last month
- ☆30Updated last month
- [CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga☆78Updated last month
- ☆23Updated last month
- A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆29Updated last month
- The Next Step Forward in Multimodal LLM Alignment☆149Updated last week
- A collection of vision foundation models unifying understanding and generation.☆55Updated 4 months ago