DRSY / MoTIS
[NAACL 2022]Mobile Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)
☆124Updated last year
Alternatives and similar repositories for MoTIS:
Users that are interested in MoTIS are comparing it to the libraries listed below
- OpenAI CLIP coreML version for iOS text-image embeddings, image search, image clustering, image classifiy☆19Updated last year
- Using pretrained encoder and language models to generate captions from multimedia inputs.☆95Updated 2 years ago
- Use CLIP to represent video for Retrieval Task☆69Updated 4 years ago
- Easily compute clip embeddings from video frames☆143Updated last year
- Big-Interleaved-Dataset☆58Updated 2 years ago
- ☆18Updated last year
- The sample project how to use MobileStyleGAN in iOS.☆15Updated 3 years ago
- Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training☆136Updated 2 years ago
- Let's make a video clip☆93Updated 2 years ago
- Search photos on Unsplash based on OpenAI's CLIP model, support search with joint image+text queries and attention visualization.☆217Updated 3 years ago
- ☆64Updated last year
- M4 experiment logbook☆57Updated last year
- ☆57Updated last year
- A repository containing datasets and tools to train a watermark classifier.☆66Updated 2 years ago
- Data repository for the VALSE benchmark.☆37Updated last year
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]☆360Updated 2 years ago
- CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)☆192Updated last year
- Release of ImageNet-Captions☆46Updated 2 years ago
- Repository for the data in the paper "Explain Me the Painting: Multi-TopicKnowledgeable Art Description Generation".☆19Updated 3 years ago
- ECCV2020 paper: Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards. Code and Data.☆84Updated last year
- VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automa…☆78Updated 2 years ago
- [CVPR 2022 - Demo Track] - Effective conditioned and composed image retrieval combining CLIP-based features☆78Updated 4 months ago
- ☆44Updated 3 years ago
- CLIP中文encoder☆22Updated 2 years ago
- CVPR2023 paper☆50Updated last year
- PyTorch code for MUST☆106Updated 2 years ago
- The official PyTorch implementation for arXiv'23 paper 'LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer'☆91Updated 2 months ago
- A non-JIT version implementation / replication of CLIP of OpenAI in pytorch☆34Updated 4 years ago
- Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training☆166Updated last year
- Command-line tool for downloading and extending the RedCaps dataset.☆46Updated last year