opendatalab / CLIP-Parrot-Bias
ECCV2024_Parrot Captions Teach CLIP to Spot Text
☆66Updated 8 months ago
Alternatives and similar repositories for CLIP-Parrot-Bias:
Users that are interested in CLIP-Parrot-Bias are comparing it to the libraries listed below
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 10 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 9 months ago
- ☆19Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆26Updated last year
- Training code for CLIP-FlanT5☆26Updated 9 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆44Updated last year
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆72Updated 3 months ago
- Official repo for StableLLAVA☆95Updated last year
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆50Updated 4 months ago
- ☆133Updated last year
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆46Updated 11 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆56Updated 3 weeks ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆69Updated 7 months ago
- [NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.☆47Updated 6 months ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆39Updated 7 months ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆56Updated last year
- ☆57Updated last year
- ☆17Updated 6 months ago
- The official repository for the RealSyn dataset☆28Updated last week
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Updated last year
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆24Updated last month
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆20Updated 4 months ago
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis☆83Updated 9 months ago
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆85Updated 3 months ago
- ☆34Updated last year
- (arXiv.2405.18406) RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives☆36Updated 6 months ago
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions☆55Updated last year
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆58Updated last year
- T2VScore: Towards A Better Metric for Text-to-Video Generation☆79Updated last year
- [CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation☆70Updated 9 months ago