damian0815 / finetune-clip-huggingfaceLinks
Finetuning CLIP on a small image/text dataset using huggingface libs
☆52Updated 2 years ago
Alternatives and similar repositories for finetune-clip-huggingface
Users that are interested in finetune-clip-huggingface are comparing it to the libraries listed below
Sorting:
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆187Updated 2 years ago
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆81Updated 2 years ago
- Fine tuning OpenAI's CLIP model on Indian Fashion Dataset☆52Updated 2 years ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆157Updated last year
- Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)☆178Updated 4 months ago
- Image Editing Anything☆116Updated 2 years ago
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆87Updated 9 months ago
- [AAAI 2023] Painterly image harmonization in both spatial domain and frequency domain.☆55Updated 5 months ago
- Open source implementation of "A Self-Supervised Descriptor for Image Copy Detection" (SSCD).☆362Updated 3 years ago
- Dreambooth (LoRA) with well-organized code structure. Naive adaptation from 🤗Diffusers.☆14Updated 2 years ago
- The official PyTorch implementation for arXiv'23 paper 'LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer'☆100Updated 5 months ago
- Official Pytorch implementation of LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)☆136Updated last year
- This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)☆91Updated last year
- NÜWA-LIP: Language Guided Image Inpainting with Defect-free VQGAN☆40Updated 2 years ago
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆91Updated last year
- ☆92Updated 2 years ago
- My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"☆262Updated last week
- ☆66Updated 11 months ago
- Precision Search through Multi-Style Inputs☆72Updated 3 months ago
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆210Updated last year
- The benchmark of SOTA text-to-image diffusion models with a new benchmarking strategy based on MiniGPT-4, namely X-IQE.☆126Updated 2 years ago
- ☆99Updated last year
- This repository is the code of our paper "DiffUTE: Universal Text Editing Diffusion Model" (NeurIPS'2023).☆143Updated 6 months ago
- Official Implementations "StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing" (CVMJ2024)☆78Updated last year
- [ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion☆192Updated 3 months ago
- Source code for paper: "AltDiffusion: A multilingual Text-to-Image diffusion model"☆42Updated last year
- PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)☆246Updated 4 months ago
- FInetuning CLIP for Few Shot Learning☆46Updated 3 years ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆65Updated last year
- ACM MM'23 (oral), SUR-adapter for pre-trained diffusion models can acquire the powerful semantic understanding and reasoning capabilities…☆121Updated last month