SforAiDl / CountCLIP
☆20Updated 7 months ago
Alternatives and similar repositories for CountCLIP:
Users that are interested in CountCLIP are comparing it to the libraries listed below
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆60Updated 3 months ago
- Official repository of paper "Subobject-level Image Tokenization"☆65Updated 9 months ago
- ☆35Updated 6 months ago
- [ICLR 2025] SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image and Video Generation☆21Updated last week
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆54Updated last year
- Augmenting with Language-guided Image Augmentation (ALIA)☆70Updated last year
- Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)☆25Updated 5 months ago
- source code for NeurIPS'23 paper "Dream the Impossible: Outlier Imagination with Diffusion Models"☆64Updated last week
- ☆23Updated 2 weeks ago
- (arXiv.2405.18406) RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives☆32Updated 3 months ago
- Matryoshka Multimodal Models☆93Updated last week
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆63Updated 7 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆19Updated 3 months ago
- Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)☆81Updated last month
- Sparse Linear Concept Embeddings☆81Updated 5 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆32Updated 7 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆44Updated last year
- ☆36Updated last week
- RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with t…☆115Updated 7 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆44Updated last month
- Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)☆112Updated 9 months ago
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation☆40Updated 3 months ago
- NegCLIP.☆30Updated last year
- [ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning☆35Updated this week
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆66Updated 3 months ago
- [CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"☆34Updated 2 months ago
- ☆31Updated 4 months ago
- ☆40Updated 10 months ago
- Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"☆53Updated this week
- Adapting LLaMA Decoder to Vision Transformer☆26Updated 8 months ago