damian0815 / finetune-clip-huggingfaceLinks
Finetuning CLIP on a small image/text dataset using huggingface libs
☆47Updated 2 years ago
Alternatives and similar repositories for finetune-clip-huggingface
Users that are interested in finetune-clip-huggingface are comparing it to the libraries listed below
Sorting:
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆84Updated 4 months ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆177Updated last year
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆81Updated 2 years ago
- This is implementation of finetuning BLIP model for Visual Question Answering☆72Updated last year
- [ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion☆175Updated last year
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆91Updated last year
- Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023☆25Updated last year
- [CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval☆38Updated 2 years ago
- [CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation☆72Updated 11 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆145Updated 2 months ago
- Official Pytorch implementation of LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)☆134Updated 11 months ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆149Updated last year
- [ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion☆264Updated 7 months ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 10 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆132Updated last month
- This repository is the code of our paper "DiffUTE: Universal Text Editing Diffusion Model" (NeurIPS'2023).☆134Updated 2 months ago
- A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or v…☆36Updated last year
- Source code for paper: "AltDiffusion: A multilingual Text-to-Image diffusion model"☆41Updated last year
- Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024…☆45Updated 7 months ago
- ☆97Updated last year
- [NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT☆136Updated last year
- [CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloadin…☆225Updated 8 months ago
- The official PyTorch implementation for arXiv'23 paper 'LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer'☆97Updated last month
- [CVPR 2024] Official repo for "InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model".☆121Updated last week
- Search photos on Unsplash based on OpenAI's CLIP model, support search with joint image+text queries and attention visualization.☆222Updated 3 years ago
- ReCo: Region-Controlled Text-to-Image Generation, CVPR 2023☆126Updated last year
- ☆97Updated last year
- Dreambooth (LoRA) with well-organized code structure. Naive adaptation from 🤗Diffusers.☆13Updated 2 years ago
- Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)☆167Updated this week
- CLIPScore EMNLP code☆226Updated 2 years ago