mutonix / VriptLinks
☆154Updated 6 months ago
Alternatives and similar repositories for Vript
Users that are interested in Vript are comparing it to the libraries listed below
Sorting:
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆94Updated 5 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆119Updated last month
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆73Updated last year
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆65Updated 9 months ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆127Updated 6 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆150Updated 7 months ago
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆229Updated last year
- ☆93Updated last month
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆135Updated last month
- [NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models☆158Updated 10 months ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer☆230Updated last year
- Unified Multi-modal IAA Baseline and Benchmark☆82Updated 10 months ago
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆364Updated last week
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.☆79Updated 2 months ago
- VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation☆286Updated 4 months ago
- [ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation☆324Updated 2 months ago
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆199Updated 3 months ago
- ☆133Updated last year
- [NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT☆136Updated last year
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis☆85Updated last year
- ☆187Updated last year
- Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"☆390Updated last month
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆145Updated 2 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆119Updated 3 months ago
- [CVPR 2024] EvalCrafter: Benchmarking and Evaluating Large Video Generation Models☆171Updated 10 months ago
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆211Updated last year
- ☆138Updated 10 months ago
- FQGAN: Factorized Visual Tokenization and Generation☆52Updated 4 months ago
- ☆105Updated last year
- A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!☆131Updated last year