damian0815 / finetune-clip-huggingfaceLinks
Finetuning CLIP on a small image/text dataset using huggingface libs
☆52Updated 3 years ago
Alternatives and similar repositories for finetune-clip-huggingface
Users that are interested in finetune-clip-huggingface are comparing it to the libraries listed below
Sorting:
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆82Updated 3 years ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆192Updated 2 years ago
- Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)☆188Updated 7 months ago
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆88Updated 11 months ago
- Official Pytorch implementation of LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)☆142Updated 3 weeks ago
- [NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"☆287Updated 2 years ago
- ☆93Updated 2 years ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆162Updated last year
- [CVPR 2023 (Highlight)] FAME-ViL: Multi-Tasking V+L Model for Heterogeneous Fashion Tasks☆55Updated 2 years ago
- [ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion☆196Updated 6 months ago
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆93Updated last year
- Fine tuning OpenAI's CLIP model on Indian Fashion Dataset☆52Updated 2 years ago
- Generate text captions for images from their embeddings.☆118Updated 2 years ago
- Search photos on Unsplash based on OpenAI's CLIP model, support search with joint image+text queries and attention visualization.☆223Updated 4 years ago
- The official PyTorch implementation for arXiv'23 paper 'LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer'☆102Updated 8 months ago
- A curated list of text-based image manipulation methods.☆84Updated last year
- Densely Captioned Images (DCI) dataset repository.☆195Updated last year
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Updated last year
- This is implementation of finetuning BLIP model for Visual Question Answering☆83Updated 2 years ago
- ACM MM'23 (oral), SUR-adapter for pre-trained diffusion models can acquire the powerful semantic understanding and reasoning capabilities…☆120Updated 4 months ago
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions☆55Updated last year
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆211Updated last year
- Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024…☆63Updated last year
- This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)☆92Updated last year
- Dreambooth (LoRA) with well-organized code structure. Naive adaptation from 🤗Diffusers.☆15Updated 2 years ago
- [NeurIPS2023] This is the official code of the paper "GlyphControl: Glyph Conditional Control for Visual Text Generation"☆239Updated last year
- Open source implementation of "A Self-Supervised Descriptor for Image Copy Detection" (SSCD).☆380Updated 3 years ago
- ☆34Updated 2 years ago
- CLIPScore EMNLP code☆244Updated 3 years ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆146Updated 2 weeks ago