damian0815 / finetune-clip-huggingfaceLinks
Finetuning CLIP on a small image/text dataset using huggingface libs
☆52Updated 2 years ago
Alternatives and similar repositories for finetune-clip-huggingface
Users that are interested in finetune-clip-huggingface are comparing it to the libraries listed below
Sorting:
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆81Updated 2 years ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆186Updated 2 years ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆155Updated last year
- Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)☆177Updated 3 months ago
- [IEEE TPAMI] Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation☆318Updated 4 months ago
- Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023☆28Updated 2 years ago
- This repository is the code of our paper "DiffUTE: Universal Text Editing Diffusion Model" (NeurIPS'2023).☆141Updated 6 months ago
- [CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloadin…☆228Updated last year
- 🎁 A Large-scale Multi-modal E-Commerce Products Dataset (LTDL@IJCAI-21 Best Dataset & Pattern Recognition 2023)☆36Updated last year
- Official Pytorch implementation of LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)☆137Updated last year
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆91Updated last year
- The official PyTorch implementation for arXiv'23 paper 'LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer'☆100Updated 4 months ago
- Search photos on Unsplash based on OpenAI's CLIP model, support search with joint image+text queries and attention visualization.☆223Updated 4 years ago
- Text-To-Image Generation with Chinese Characters☆131Updated 2 years ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆65Updated last year
- [CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation☆80Updated last year
- [NeurIPS2023] This is the official code of the paper "GlyphControl: Glyph Conditional Control for Visual Text Generation"☆235Updated last year
- ☆98Updated last year
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆211Updated last year
- Image Editing Anything☆116Updated 2 years ago
- [NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"☆286Updated last year
- ☆92Updated 2 years ago
- ☆99Updated last year
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation☆132Updated last year
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆87Updated 8 months ago
- Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic☆279Updated 3 years ago
- The benchmark of SOTA text-to-image diffusion models with a new benchmarking strategy based on MiniGPT-4, namely X-IQE.☆125Updated 2 years ago
- CLIPScore EMNLP code☆236Updated 2 years ago
- My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"☆256Updated last week
- Densely Captioned Images (DCI) dataset repository.☆191Updated last year