damian0815 / finetune-clip-huggingfaceLinks
Finetuning CLIP on a small image/text dataset using huggingface libs
☆52Updated 3 years ago
Alternatives and similar repositories for finetune-clip-huggingface
Users that are interested in finetune-clip-huggingface are comparing it to the libraries listed below
Sorting:
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆82Updated 2 years ago
- Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)☆188Updated 6 months ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆189Updated 2 years ago
- [NeurIPS2023] This is the official code of the paper "GlyphControl: Glyph Conditional Control for Visual Text Generation"☆239Updated last year
- [CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation☆84Updated last year
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆88Updated 11 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Updated last year
- CLIPScore EMNLP code☆243Updated 3 years ago
- Dreambooth (LoRA) with well-organized code structure. Naive adaptation from 🤗Diffusers.☆15Updated 2 years ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆162Updated last year
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆92Updated last year
- This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)☆92Updated last year
- Densely Captioned Images (DCI) dataset repository.☆195Updated last year
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆146Updated this week
- Better Aligning Text-to-Image Models with Human Preference. ICCV 2023☆293Updated 2 years ago
- [ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion☆273Updated last year
- [NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"☆287Updated last year
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation☆134Updated 2 years ago
- Image Editing Anything☆116Updated 2 years ago
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆205Updated 11 months ago
- ACM MM'23 (oral), SUR-adapter for pre-trained diffusion models can acquire the powerful semantic understanding and reasoning capabilities…☆121Updated 4 months ago
- Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)☆102Updated last year
- The official PyTorch implementation for arXiv'23 paper 'LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer'☆102Updated 7 months ago
- ☆100Updated 2 years ago
- A curated list of text-based image manipulation methods.☆84Updated last year
- [CVPR'24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloadin…☆231Updated last year
- Data release for the ImageInWords (IIW) paper.☆224Updated last year
- [IJCV 2024] MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation☆128Updated last year
- ☆130Updated 2 years ago
- The benchmark of SOTA text-to-image diffusion models with a new benchmarking strategy based on MiniGPT-4, namely X-IQE.☆127Updated last month