damian0815 / finetune-clip-huggingfaceLinks
Finetuning CLIP on a small image/text dataset using huggingface libs
☆47Updated 2 years ago
Alternatives and similar repositories for finetune-clip-huggingface
Users that are interested in finetune-clip-huggingface are comparing it to the libraries listed below
Sorting:
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆85Updated 4 months ago
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆81Updated 2 years ago
- Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024…☆45Updated 6 months ago
- [CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation☆69Updated 10 months ago
- Fine tuning OpenAI's CLIP model on Indian Fashion Dataset☆50Updated 2 years ago
- This is implementation of finetuning BLIP model for Visual Question Answering☆68Updated last year
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆131Updated 3 weeks ago
- Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023☆25Updated last year
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆149Updated 11 months ago
- Official Pytorch implementation of LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)☆134Updated 10 months ago
- Training code for CLIP-FlanT5☆26Updated 10 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆57Updated last month
- This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)☆89Updated last year
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆90Updated last year
- Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)☆88Updated 5 months ago
- Dreambooth (LoRA) with well-organized code structure. Naive adaptation from 🤗Diffusers.☆12Updated 2 years ago
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions☆55Updated last year
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆178Updated last year
- Official PyTorch implementation of "Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization" (ECCV 2024)☆24Updated 2 months ago
- Official Implementations "StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing" (CVMJ2024)☆75Updated 10 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Updated 9 months ago
- [AAAI 2023] CoordFill: Efficient High-Resolution Image Inpainting via Parameterized Coordinate Querying☆86Updated last year
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆58Updated last year
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆156Updated 8 months ago
- Code for "DreamEdit: Subject-driven Image Editing" (TMLR2023)☆108Updated last year
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆40Updated 8 months ago
- Improved Implementation for Training GLIGEN: Open-Set Grounded Text-to-Image Generation☆43Updated last year
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation☆132Updated last year
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 9 months ago
- [ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"☆77Updated last year