rongyaofang / PUMA
Empowering Unified MLLM with Multi-granular Visual Generation
☆117Updated last month
Alternatives and similar repositories for PUMA:
Users that are interested in PUMA are comparing it to the libraries listed below
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆73Updated last week
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis☆83Updated 7 months ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆62Updated 8 months ago
- 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆253Updated last month
- A collection of vision foundation models unifying understanding and generation.☆40Updated last month
- Liquid: Language Models are Scalable Multi-modal Generators☆65Updated 2 months ago
- T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation☆65Updated this week
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆45Updated 4 months ago
- Official Implementation of VideoDPO☆48Updated last month
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆36Updated 2 months ago
- [ICLR2025]☆137Updated 3 weeks ago
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆78Updated 2 weeks ago
- Code for ROICtrl: Boosting Instance Control for Visual Generation☆101Updated 2 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆75Updated 3 weeks ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆105Updated last month
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆84Updated 4 months ago
- The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆85Updated 3 months ago
- FQGAN: Factorized Visual Tokenization and Generation☆42Updated last month
- This is a repo to track the latest autoregressive visual generation papers.☆139Updated last week
- RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with t…☆115Updated 7 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆61Updated 5 months ago
- ☆134Updated last month
- [Neurips 2024] Video Diffusion Models are Training-free Motion Interpreter and Controller☆33Updated this week
- ☆17Updated last month
- CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆75Updated 3 weeks ago
- The official implementation of PAR: Parallelized Autoregressive Visual Generation. https://epiphqny.github.io/PAR-project/☆110Updated last month
- [Neurips 2023 & TPAMI] T2I-CompBench (++) for Compositional Text-to-image Generation Evaluation☆234Updated 2 weeks ago
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.☆59Updated last month