damian0815 / finetune-clip-huggingface
Finetuning CLIP on a small image/text dataset using huggingface libs
☆46Updated 2 years ago
Alternatives and similar repositories for finetune-clip-huggingface:
Users that are interested in finetune-clip-huggingface are comparing it to the libraries listed below
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆83Updated last month
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆79Updated 2 years ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆174Updated last year
- [TMM 2023] VideoXum: Cross-modal Visual and Textural Summarization of Videos☆43Updated 11 months ago
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions☆55Updated 11 months ago
- Code for "DreamEdit: Subject-driven Image Editing" (TMLR2023)☆106Updated last year
- Dreambooth (LoRA) with well-organized code structure. Naive adaptation from 🤗Diffusers.☆14Updated last year
- ACM MM'23 (oral), SUR-adapter for pre-trained diffusion models can acquire the powerful semantic understanding and reasoning capabilities…☆119Updated 11 months ago
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆88Updated last year
- Code for Learning Subject-Aware Cropping by Outpainting Professional Photos☆16Updated last year
- [CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval☆38Updated 2 years ago
- InstructionGPT-4☆39Updated last year
- Image Editing Anything☆113Updated last year
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 7 months ago
- Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)☆160Updated last year
- Official PyTorch implementation of "Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization" (ECCV 2024)☆23Updated 3 weeks ago
- Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024…☆34Updated 4 months ago
- CLIPScore EMNLP code☆218Updated 2 years ago
- ☆92Updated last year
- [CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation☆68Updated 8 months ago
- Code for Shifted Diffusion for Text-to-image Generation (CVPR 2023)☆162Updated last year
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆38Updated 6 months ago
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation☆128Updated last year
- [ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion☆169Updated 10 months ago
- Precision Search through Multi-Style Inputs☆65Updated 8 months ago
- This is implementation of finetuning BLIP model for Visual Question Answering☆64Updated last year
- [CVPR 2023 (Highlight)] FAME-ViL: Multi-Tasking V+L Model for Heterogeneous Fashion Tasks☆53Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆26Updated last year
- ALIGN trained on COYO-dataset☆29Updated 11 months ago
- Densely Captioned Images (DCI) dataset repository.☆175Updated 8 months ago