Generate text captions for images from their embeddings.
☆119Aug 1, 2023Updated 2 years ago
Alternatives and similar repositories for clip-text-decoder
Users that are interested in clip-text-decoder are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ECCV2022] Source Code for "Improving GANs for Long-Tailed Data through Group Spectral Regularization"☆16Oct 2, 2022Updated 3 years ago
- babyLM WhisBERT code☆19May 27, 2024Updated last year
- A pytorch implementation of Attention Is All You Need (Transformer) for image captioning.☆12Nov 15, 2021Updated 4 years ago
- ☆11Oct 2, 2024Updated last year
- A Pytorch implementation of the paper 'Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering'☆10Jan 20, 2020Updated 6 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic☆277Sep 17, 2022Updated 3 years ago
- ☆12Sep 19, 2021Updated 4 years ago
- t-vMF Similarity for Regularizing Intra-Class Feature Distribution☆21Jun 11, 2021Updated 4 years ago
- X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization, CVPR 2024☆11Nov 7, 2024Updated last year
- S-CLIP: Semi-supervised Vision-Language Pre-training using Few Specialist Captions☆51May 26, 2023Updated 2 years ago
- DDSP-FM: a differentiable FM synth based on Magenta's DDSP library.☆21Jun 14, 2021Updated 4 years ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆45Jun 14, 2024Updated last year
- Using LLMs and pre-trained caption models for super-human performance on image captioning.☆42Oct 13, 2023Updated 2 years ago
- Codes and scripts for "Explainable Semantic Space by Grounding Languageto Vision with Cross-Modal Contrastive Learning"☆20Mar 23, 2022Updated 4 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Synthesis of percussion sounds using sinusoidal modelling, DDSP noise synthesis, and a neural source filter approach.☆34Jan 7, 2025Updated last year
- CLIP is an open source, multimodal computer vision model and it's awesome!☆17Dec 16, 2024Updated last year
- [ICCV2025] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆60Apr 4, 2026Updated last month
- Code for the paper "Multi-Task Learning of Object States and State-Modifying Actions from Web Videos" published in TPAMI☆11Mar 3, 2024Updated 2 years ago
- InstructionGPT-4☆42Dec 29, 2023Updated 2 years ago
- Music Demixing Challenge Submission Repo☆16Sep 8, 2023Updated 2 years ago
- Frozen Pretrained Transformers for Neural Sign Language Translation☆15Apr 23, 2022Updated 4 years ago
- Official implemention for Diffusion Models Are Innate One-Step Generators☆26Jun 25, 2025Updated 10 months ago
- ☆20May 3, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆59Aug 30, 2023Updated 2 years ago
- Code for ICLR 2023 Paper, "Stable Target Field for Reduced Variance Score Estimation in Diffusion Models”☆76Jun 6, 2023Updated 2 years ago
- Using pretrained encoder and language models to generate captions from multimedia inputs.☆100Mar 11, 2023Updated 3 years ago
- Learned User Representations in Online Social Networks (Twitter) using Temporal Dynamics of Information Diffusion.☆10Oct 15, 2018Updated 7 years ago
- Visualizing data to better monitor issues around food security☆14Nov 28, 2024Updated last year
- BEAR: a new BEnchmark on video Action Recognition☆46Apr 21, 2024Updated 2 years ago
- A repo for shared Jupyter and Colab notebooks☆23Jul 4, 2025Updated 10 months ago
- ☆17Dec 13, 2023Updated 2 years ago
- A Differentiable Acoustic Guitar Model for String-Specific Polyphonic Synthesis☆18Nov 16, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".☆14Sep 1, 2022Updated 3 years ago
- ☆12Aug 14, 2020Updated 5 years ago
- [CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering☆22Sep 21, 2024Updated last year
- The codebase for Inducing Causal Structure for Interpretable Neural Networks☆11Dec 3, 2021Updated 4 years ago
- ☆36Nov 3, 2022Updated 3 years ago
- A Python3 program for converting Japanese words and numbers into phonemes.☆18Apr 24, 2018Updated 8 years ago
- Generate synthesizer sounds from text prompts with a simple evolutionary algorithm.☆28Jan 12, 2026Updated 3 months ago