shxie2020 / Awesome-UGVFM
A collection of vision foundation models unifying understanding and generation.
☆32Updated last week
Alternatives and similar repositories for Awesome-UGVFM:
Users that are interested in Awesome-UGVFM are comparing it to the libraries listed below
- Liquid: Language Models are Scalable Multi-modal Generators☆57Updated 3 weeks ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆113Updated 2 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆94Updated 2 weeks ago
- Official code for paper: Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language☆22Updated 6 months ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆57Updated 7 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆120Updated last month
- FQGAN: Factorized Visual Tokenization and Generation☆39Updated this week
- Code for ROICtrl: Boosting Instance Control for Visual Generation☆99Updated last month
- VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆40Updated last week
- Replication in Visual Diffusion Models: A Survey and Outlook☆26Updated 5 months ago
- 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆219Updated last week
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆33Updated 2 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆34Updated last month
- This is a repo to track the latest autoregressive visual generation papers.☆96Updated last week
- The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆77Updated 2 months ago
- a collection of awesome autoregressive visual generation models☆59Updated last week
- CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆75Updated last month
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆59Updated 2 months ago
- InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption 🔍☆28Updated 3 weeks ago
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆45Updated 2 months ago
- Open implementation of "RandAR"☆46Updated last week
- ☆96Updated 3 weeks ago
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆22Updated 2 weeks ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆54Updated 7 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆131Updated last month
- CAR: Controllable AutoRegressive Modeling for Visual Generation☆90Updated last month
- Official code for CVPR 2024 paper: Discriminative Probing and Tuning for Text-to-Image Generation☆27Updated last month
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆39Updated 3 weeks ago
- ☆42Updated last week
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆63Updated 2 months ago