iejMac / video2datasetLinks
Easily create large video dataset from video urls
☆634Updated last year
Alternatives and similar repositories for video2dataset
Users that are interested in video2dataset are comparing it to the libraries listed below
Sorting:
- Large-scale text-video dataset. 10 million captioned short videos.☆660Updated last year
 - Open reproduction of MUSE for fast text2image generation.☆355Updated last year
 - [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers☆639Updated last year
 - Official implementation of SEED-LLaMA (ICLR 2024).☆632Updated last year
 - Implementation of MagViT2 Tokenizer in Pytorch☆644Updated 9 months ago
 - An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal …☆361Updated last year
 - ☆628Updated last year
 - Code release for "Learning Video Representations from Large Language Models"☆537Updated 2 years ago
 - 🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".☆467Updated last year
 - DataComp: In search of the next generation of multimodal datasets☆745Updated 6 months ago
 - Official JAX implementation of MAGVIT: Masked Generative Video Transformer☆986Updated last year
 - Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"☆491Updated last year
 - ☆557Updated 10 months ago
 - Multi-modality pre-training☆503Updated last year
 - LaVIT: Empower the Large Language Model to Understand and Generate Visual Content☆594Updated last year
 - LVDM: Latent Video Diffusion Models for High-Fidelity Long Video Generation☆495Updated 11 months ago
 - Fine-tuning "ImageBind One Embedding Space to Bind Them All" with LoRA☆191Updated last year
 - A linear estimator on top of clip to predict the aesthetic quality of pictures☆608Updated 3 years ago
 - Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers, in Pytorch☆912Updated last year
 - [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding☆664Updated 9 months ago
 - Official Repository of ChatCaptioner☆466Updated 2 years ago
 - Implementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorch☆280Updated last year
 - [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"☆318Updated last year
 - EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆131Updated 11 months ago
 - Get hundred of million of image+url from the crawling at home dataset and preprocess them☆222Updated last year
 - Official repository for the paper PLLaVA☆670Updated last year
 - Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"☆410Updated last year
 - 🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".☆482Updated 2 years ago
 - LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusi…☆478Updated last year
 - [ICML 2024 Spotlight] FiT: Flexible Vision Transformer for Diffusion Model☆423Updated 11 months ago