m-bain / webvid
Large-scale text-video dataset. 10 million captioned short videos.
☆602Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for webvid
- Easily create large video dataset from video urls☆546Updated 3 months ago
- Open reproduction of MUSE for fast text2image generation.☆332Updated 5 months ago
- ☆442Updated 9 months ago
- [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers☆526Updated 3 weeks ago
- Official implementation of SEED-LLaMA (ICLR 2024).☆579Updated 2 months ago
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"☆371Updated 2 months ago
- Multi-modality pre-training☆471Updated 6 months ago
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Content☆532Updated last month
- Better Aligning Text-to-Image Models with Human Preference. ICCV 2023☆266Updated last year
- LVDM: Latent Video Diffusion Models for High-Fidelity Long Video Generation☆455Updated this week
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]☆350Updated 2 years ago
- A linear estimator on top of clip to predict the aesthetic quality of pictures☆487Updated 2 years ago
- Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis☆310Updated last year
- Official PyTorch implementation of TATS: A Long Video Generation Framework with Time-Agnostic VQGAN and Time-Sensitive Transformer (ECCV …☆268Updated 6 months ago
- An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal …☆362Updated 11 months ago
- Multimodal Models in Real World☆403Updated 3 weeks ago
- Implementation of MagViT2 Tokenizer in Pytorch☆564Updated last month
- Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"☆379Updated 7 months ago
- [ICLR 2024] Code for FreeNoise based on VideoCrafter☆386Updated 4 months ago
- 🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".☆433Updated 10 months ago
- Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis☆399Updated 5 months ago
- Official Repository of ChatCaptioner☆452Updated last year
- [CVPR2024 Highlight] VBench - We Evaluate Video Generation☆580Updated 2 weeks ago
- Official Pytorch Implementation of Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models☆192Updated last year
- Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.☆368Updated last year
- ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation (ICCV 2023, Oral)☆514Updated 10 months ago
- Official repository for the paper PLLaVA☆593Updated 3 months ago
- [SIGGRAPH Asia 2024] ReVersion: Diffusion-Based Relation Inversion from Images☆468Updated last month
- LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusi…☆435Updated 2 months ago
- ☆349Updated last month