Using pretrained encoder and language models to generate captions from multimedia inputs.
☆100Mar 11, 2023Updated 2 years ago
Alternatives and similar repositories for ClipCap
Users that are interested in ClipCap are comparing it to the libraries listed below
Sorting:
- I have created a dataset of Image-Text-Pairs by using the cosine similarity of the CLIP embeddings of the image & it's caption derrived f…☆16Apr 22, 2021Updated 4 years ago
- ☆21Mar 15, 2023Updated 2 years ago
- CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022☆29Dec 1, 2022Updated 3 years ago
- Refactoring dalle-pytorch and taming-transformers for TPU VM☆60Aug 30, 2021Updated 4 years ago
- Aim for the moon. If you miss, you may hit a star.☆164Feb 14, 2023Updated 3 years ago
- PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)☆246Jun 10, 2025Updated 8 months ago
- Memory-efficient transformer. Work in progress.☆19Sep 17, 2022Updated 3 years ago
- Let's make a video clip☆96Jul 29, 2022Updated 3 years ago
- Simple image captioning model☆1,412Jun 9, 2024Updated last year
- Inverts CLIP text embeds to image embeds and visualizes with deep-image-prior.☆35Jul 3, 2022Updated 3 years ago
- Get hundred of million of image+url from the crawling at home dataset and preprocess them☆223May 26, 2024Updated last year
- OpenAI CLIP text encoders for multiple languages!☆826May 15, 2023Updated 2 years ago
- ☆112Aug 5, 2021Updated 4 years ago
- Un-*** 50 billions multimodality dataset☆23Sep 14, 2022Updated 3 years ago
- Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training☆169Apr 27, 2023Updated 2 years ago
- Majesty Diffusion by @Dango233 and @apolinario (@multimodalart)☆25Jul 26, 2022Updated 3 years ago
- Efficiently read embedding in streaming from any filesystem☆105Aug 9, 2025Updated 6 months ago
- jupyter/colab implementation of stable-diffusion using k_lms sampler, cpu draw manual seeding, and quantize.py fix☆38Aug 20, 2022Updated 3 years ago
- Finetune glide-text2im from openai on your own data.☆88Feb 28, 2026Updated last week
- ☆25Jul 10, 2023Updated 2 years ago
- CLOOB training (JAX) and inference (JAX and PyTorch)☆74May 16, 2022Updated 3 years ago
- EasyRLHF aims to provide an easy and minimal interface to train aligned language models, using off-the-shelf solutions and datasets☆10Dec 12, 2023Updated 2 years ago
- Dataset for Paper "Exploring Content Selection in Summarization of Novel Chapters"☆14Mar 20, 2023Updated 2 years ago
- Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning☆11Jul 20, 2022Updated 3 years ago
- A Simple Framwork for CV Pre-training Model (SOCO, VirTex, BEiT)☆15Oct 18, 2021Updated 4 years ago
- A PyTorch implementation of Proxy Anchor Loss based on CVPR 2020 paper "Proxy Anchor Loss for Deep Metric Learning"☆11Jan 16, 2021Updated 5 years ago
- ☆10Aug 25, 2019Updated 6 years ago
- KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using a…☆12Mar 24, 2023Updated 2 years ago
- Easily compute clip embeddings and build a clip retrieval system with them☆2,732Aug 15, 2025Updated 6 months ago
- Repository for "Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search"☆179Sep 30, 2021Updated 4 years ago
- ☆11Sep 7, 2020Updated 5 years ago
- Xfce Desktop container designed for direct access to the GPU with EGL using VirtualGL for GPUs. Does not require /tmp/.X11-unix host sock…☆10Jul 25, 2022Updated 3 years ago
- Stable diffusion google colab kernel☆10Aug 17, 2022Updated 3 years ago
- PyTorch code for MUST☆108May 1, 2025Updated 10 months ago
- StableDiffusion scripts based on huggingface diffusers.☆15Feb 23, 2025Updated last year
- Aggregating embeddings over time☆32Jan 19, 2023Updated 3 years ago
- CLOOB Conditioned Latent Diffusion training and inference code☆111Apr 15, 2022Updated 3 years ago
- The code of building a web demo for Auto_painter☆28Jun 2, 2020Updated 5 years ago
- RUDOLPH: One Hyper-Tasking Transformer can be creative as DALL-E and GPT-3 and smart as CLIP☆253Feb 6, 2023Updated 3 years ago