damian0815 / finetune-clip-huggingfaceLinks
Finetuning CLIP on a small image/text dataset using huggingface libs
☆48Updated 2 years ago
Alternatives and similar repositories for finetune-clip-huggingface
Users that are interested in finetune-clip-huggingface are comparing it to the libraries listed below
Sorting:
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆81Updated 2 years ago
- Fine tuning OpenAI's CLIP model on Indian Fashion Dataset☆50Updated 2 years ago
- [NeurIPS2023] This is the official code of the paper "GlyphControl: Glyph Conditional Control for Visual Text Generation"☆233Updated last year
- ☆92Updated last year
- Image Editing Anything☆116Updated 2 years ago
- Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)☆169Updated 3 weeks ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆180Updated last year
- [CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation☆72Updated last year
- ☆99Updated last year
- The official PyTorch implementation for arXiv'23 paper 'LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer'☆100Updated 2 months ago
- CLIP-based aesthetics predictor inspired by the interface of 🤗 huggingface transformers.☆38Updated last year
- This repository is the code of our paper "DiffUTE: Universal Text Editing Diffusion Model" (NeurIPS'2023).☆138Updated 3 months ago
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation☆132Updated last year
- Text-To-Image Generation with Chinese Characters☆130Updated last year
- ☆61Updated 2 years ago
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆91Updated last year
- Code for Shifted Diffusion for Text-to-image Generation (CVPR 2023)☆161Updated 2 years ago
- My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"☆242Updated 3 months ago
- A simple Segment Anything WebUI based on Gradio.☆80Updated 2 years ago
- ☆96Updated 11 months ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆151Updated last year
- A curated list of text-based image manipulation methods.☆84Updated 8 months ago
- [IEEE TPAMI] Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation☆290Updated last month
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Updated 10 months ago
- [AAAI 2023] Painterly image harmonization in both spatial domain and frequency domain.☆55Updated last month
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆85Updated 5 months ago
- A demo of fine tune Stable Diffusion on Pokemon-Blip-Captions in English, Japanese and Chinese Corpus☆38Updated 2 years ago
- Image Prompter for Gradio☆92Updated last year
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆144Updated 3 months ago
- Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023☆26Updated 2 years ago