damian0815 / finetune-clip-huggingfaceLinks
Finetuning CLIP on a small image/text dataset using huggingface libs
☆52Updated 2 years ago
Alternatives and similar repositories for finetune-clip-huggingface
Users that are interested in finetune-clip-huggingface are comparing it to the libraries listed below
Sorting:
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆82Updated 2 years ago
- Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)☆183Updated 5 months ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆188Updated 2 years ago
- Image Editing Anything☆116Updated 2 years ago
- Fine tuning OpenAI's CLIP model on Indian Fashion Dataset☆52Updated 2 years ago
- Search photos on Unsplash based on OpenAI's CLIP model, support search with joint image+text queries and attention visualization.☆223Updated 4 years ago
- [NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"☆286Updated last year
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆159Updated last year
- The official PyTorch implementation for arXiv'23 paper 'LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer'☆101Updated 6 months ago
- PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)☆246Updated 6 months ago
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆88Updated 10 months ago
- [CVPR 2023 (Highlight)] FAME-ViL: Multi-Tasking V+L Model for Heterogeneous Fashion Tasks☆55Updated 2 years ago
- A curated list of text-based image manipulation methods.☆84Updated last year
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆92Updated last year
- Official Pytorch implementation of LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)☆137Updated last year
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Updated last year
- This repository is the code of our paper "DiffUTE: Universal Text Editing Diffusion Model" (NeurIPS'2023).☆143Updated 8 months ago
- ☆93Updated 2 years ago
- CLIPScore EMNLP code☆241Updated 2 years ago
- Generate text captions for images from their embeddings.☆116Updated 2 years ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆36Updated last year
- 🎁 A Large-scale Multi-modal E-Commerce Products Dataset (LTDL@IJCAI-21 Best Dataset & Pattern Recognition 2023)☆38Updated last year
- CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)☆202Updated last year
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation☆134Updated 2 years ago
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions☆56Updated last year
- A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or v…☆39Updated last year
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆146Updated last month
- [NeurIPS2023] This is the official code of the paper "GlyphControl: Glyph Conditional Control for Visual Text Generation"☆237Updated last year
- [ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion☆195Updated 4 months ago
- Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic☆280Updated 3 years ago