m-bain / webvidLinks
Large-scale text-video dataset. 10 million captioned short videos.
☆660Updated last year
Alternatives and similar repositories for webvid
Users that are interested in webvid are comparing it to the libraries listed below
Sorting:
- Easily create large video dataset from video urls☆634Updated last year
 - [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers☆639Updated last year
 - Multi-modality pre-training☆503Updated last year
 - Official implementation of SEED-LLaMA (ICLR 2024).☆632Updated last year
 - LaVIT: Empower the Large Language Model to Understand and Generate Visual Content☆594Updated last year
 - LVDM: Latent Video Diffusion Models for High-Fidelity Long Video Generation☆495Updated 11 months ago
 - A linear estimator on top of clip to predict the aesthetic quality of pictures☆608Updated 3 years ago
 - Open reproduction of MUSE for fast text2image generation.☆355Updated last year
 - Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"☆491Updated last year
 - ☆557Updated 10 months ago
 - An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal …☆361Updated last year
 - Official Repository of ChatCaptioner☆466Updated 2 years ago
 - 🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".☆467Updated last year
 - Official repository for the paper PLLaVA☆670Updated last year
 - ☆628Updated last year
 - LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusi…☆478Updated last year
 - [NeurIPS'23] "MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing".☆381Updated 8 months ago
 - Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models☆313Updated last year
 - Official JAX implementation of MAGVIT: Masked Generative Video Transformer☆986Updated last year
 - [IJCV] FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention☆710Updated 9 months ago
 - ☆335Updated 2 years ago
 - [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding☆664Updated 9 months ago
 - EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆131Updated 11 months ago
 - Get hundred of million of image+url from the crawling at home dataset and preprocess them☆222Updated last year
 - Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]☆377Updated 3 years ago
 - Unified Controllable Visual Generation Model☆650Updated 9 months ago
 - [ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models☆338Updated last year
 - Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis☆320Updated last year
 - ☆196Updated last year
 - Code release for "Learning Video Representations from Large Language Models"☆537Updated 2 years ago