damian0815 / finetune-clip-huggingface
Finetuning CLIP on a small image/text dataset using huggingface libs
☆47Updated 2 years ago
Alternatives and similar repositories for finetune-clip-huggingface:
Users that are interested in finetune-clip-huggingface are comparing it to the libraries listed below
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆81Updated 2 years ago
- [CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation☆70Updated 9 months ago
- ALIGN trained on COYO-dataset☆29Updated 11 months ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆147Updated 10 months ago
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆84Updated 2 months ago
- Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024…☆35Updated 5 months ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆177Updated last year
- Training code for CLIP-FlanT5☆26Updated 8 months ago
- Dreambooth (LoRA) with well-organized code structure. Naive adaptation from 🤗Diffusers.☆14Updated last year
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆39Updated 7 months ago
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions☆55Updated last year
- [NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT☆136Updated 11 months ago
- Code for "DreamEdit: Subject-driven Image Editing" (TMLR2023)☆107Updated last year
- Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)☆164Updated last year
- Official Pytorch implementation of LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)☆132Updated 9 months ago
- ACM MM'23 (oral), SUR-adapter for pre-trained diffusion models can acquire the powerful semantic understanding and reasoning capabilities…☆119Updated last year
- Densely Captioned Images (DCI) dataset repository.☆177Updated 9 months ago
- [CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloadin…☆220Updated 6 months ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆71Updated 2 months ago
- CLIP-based aesthetics predictor inspired by the interface of 🤗 huggingface transformers.☆36Updated 10 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆35Updated 8 months ago
- Official PyTorch implementation of "Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization" (ECCV 2024)☆23Updated last month
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆129Updated 4 months ago
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆89Updated last year
- ReCo: Region-Controlled Text-to-Image Generation, CVPR 2023☆126Updated last year
- Improved Implementation for Training GLIGEN: Open-Set Grounded Text-to-Image Generation☆43Updated 10 months ago
- [ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion☆263Updated 5 months ago
- Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)☆86Updated 4 months ago
- Fine tuning OpenAI's CLIP model on Indian Fashion Dataset☆51Updated last year
- Source code for paper: "AltDiffusion: A multilingual Text-to-Image diffusion model"☆39Updated last year