iejMac / video2dataset
Easily create large video dataset from video urls
☆586Updated 7 months ago
Alternatives and similar repositories for video2dataset:
Users that are interested in video2dataset are comparing it to the libraries listed below
- Large-scale text-video dataset. 10 million captioned short videos.☆627Updated 7 months ago
- [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers☆586Updated 4 months ago
- A linear estimator on top of clip to predict the aesthetic quality of pictures☆529Updated 2 years ago
- ☆490Updated 3 months ago
- Multi-modality pre-training☆487Updated 10 months ago
- Implementation of MagViT2 Tokenizer in Pytorch☆597Updated 2 months ago
- 🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".☆448Updated last year
- LVDM: Latent Video Diffusion Models for High-Fidelity Long Video Generation☆471Updated 4 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).☆604Updated 6 months ago
- An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal …☆359Updated last year
- Code release for "Learning Video Representations from Large Language Models"☆512Updated last year
- Official Repository of ChatCaptioner☆462Updated last year
- Open reproduction of MUSE for fast text2image generation.☆347Updated 9 months ago
- Easily compute clip embeddings from video frames☆143Updated last year
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding☆600Updated last month
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"☆415Updated 6 months ago
- [ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models☆321Updated 9 months ago
- DataComp: In search of the next generation of multimodal datasets☆687Updated last year
- Fine-tuning "ImageBind One Embedding Space to Bind Them All" with LoRA☆181Updated last year
- Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models☆354Updated last year
- SEED-Voken: A Series of Powerful Visual Tokenizers☆844Updated last month
- ☆602Updated last year
- This repo contains the code for 1D tokenizer and generator☆748Updated last week
- Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers, in Pytorch☆891Updated last year
- 🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".☆479Updated last year
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Content☆569Updated 5 months ago
- Official Implementation for "Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models" (SIGGRAPH 2023)☆724Updated last year
- Official JAX implementation of MAGVIT: Masked Generative Video Transformer☆980Updated last year
- [ICLR 2024] Code for FreeNoise based on VideoCrafter☆399Updated 8 months ago
- Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis☆313Updated last year