TheoCoombes / ClipCap
Using pretrained encoder and language models to generate captions from multimedia inputs.
☆94Updated 2 years ago
Alternatives and similar repositories for ClipCap:
Users that are interested in ClipCap are comparing it to the libraries listed below
- L-Verse: Bidirectional Generation Between Image and Text☆108Updated 2 years ago
- Command-line tool for downloading and extending the RedCaps dataset.☆46Updated last year
- Use CLIP to represent video for Retrieval Task☆69Updated 4 years ago
- ECCV2020 paper: Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards. Code and Data.☆84Updated last year
- [BMVC22] Official Implementation of ViCHA: "Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment"☆54Updated 2 years ago
- Data repository for the VALSE benchmark.☆37Updated last year
- Finetune glide-text2im from openai on your own data.☆89Updated 2 years ago
- Easily compute clip embeddings from video frames☆143Updated last year
- Simple script to compute CLIP-based scores given a DALL-e trained model.☆30Updated 3 years ago
- Script and models for clustering LAION-400m CLIP embeddings.☆25Updated 3 years ago
- Release of ImageNet-Captions☆45Updated 2 years ago
- Inverts CLIP text embeds to image embeds and visualizes with deep-image-prior.☆35Updated 2 years ago
- Let's make a video clip☆93Updated 2 years ago
- VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automa…☆77Updated 2 years ago
- ☆50Updated 2 years ago
- ☆47Updated 4 years ago
- Research code for "Training Vision-Language Transformers from Captions Alone"☆34Updated 2 years ago
- Aggregating embeddings over time☆31Updated 2 years ago
- ☆34Updated last year
- CLIP-Art: Contrastive Pre-training for Fine-Grained Art Classification - 4th Workshop on Computer Vision for Fashion, Art, and Design☆27Updated 2 years ago
- ☆64Updated last year
- scripts for running and training imagen-pytorch☆38Updated 2 years ago
- Training simple models to predict CLIP image embeddings from text embeddings, and vice versa.☆60Updated 2 years ago
- Efficiently read embedding in streaming from any filesystem☆98Updated 10 months ago
- CLOOB training (JAX) and inference (JAX and PyTorch)☆70Updated 2 years ago
- Implementation of the video diffusion model and training scheme presented in the paper, Flexible Diffusion Modeling of Long Videos, in Py…☆84Updated 2 years ago
- [NeurIPS 2022: Score-Based Modeling Workshop] Multiresolution Textual Inversion☆99Updated 2 years ago
- multimodal video-audio-text generation and retrieval between every pair of modalities on the MUGEN dataset. The repo. contains the traini…☆39Updated last year
- Official code repository for the EMNLP 2021 paper☆26Updated 3 years ago
- ☆75Updated 2 years ago