☆29Jun 10, 2024Updated last year
Alternatives and similar repositories for CountCLIP
Users that are interested in CountCLIP are comparing it to the libraries listed below
Sorting:
- Code for "CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally"☆20Feb 14, 2025Updated last year
- CVPR 2024 Official Repository☆12Mar 27, 2024Updated last year
- Ranking-Consistent Language-Image Pretraining☆12Oct 24, 2025Updated 4 months ago
- [NeurIPS 2024] Official implementation of "Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance"☆17Dec 4, 2024Updated last year
- Official codebase for the NeurIPS 2023 paper: Towards Last-layer Retraining for Group Robustness with Fewer Annotations. https://arxiv.or…☆11May 15, 2024Updated last year
- [ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data☆14Sep 30, 2023Updated 2 years ago
- ☆11Sep 15, 2023Updated 2 years ago
- Official implemention of "Make It Count: Text-to-Image Generation with an Accurate Number of Objects" (CVPR 2025)☆97Mar 12, 2025Updated 11 months ago
- Code to reproduce the experiments in the paper: Does CLIP Bind Concepts? Probing Compositionality in Large Image Models.☆16Oct 14, 2023Updated 2 years ago
- ☆16Feb 24, 2023Updated 3 years ago
- Code for the CCE algorithm proposed in "Towards Compositionality in Concept Learning" at ICML 2024.☆17Jun 2, 2024Updated last year
- GeckoNum Benchmark for T2I Model Eval.☆15Dec 5, 2024Updated last year
- ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation☆27May 27, 2025Updated 9 months ago
- ☆17Aug 8, 2024Updated last year
- [ECCV 2024] Teach CLIP to Develop a Number Sense for Ordinal Regression☆19Apr 1, 2025Updated 11 months ago
- If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions☆17Apr 4, 2024Updated last year
- [ACM MM23] CLIP-Count: Towards Text-Guided Zero-Shot Object Counting☆123Mar 20, 2024Updated last year
- Official repository for the paper "Instance-Wise Holistic Order Prediction in Natural Scenes".☆26Jan 11, 2024Updated 2 years ago
- ☆20May 3, 2025Updated 9 months ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆22Nov 8, 2023Updated 2 years ago
- [ECCV 2024] Official code for "Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation"☆18Jul 31, 2025Updated 7 months ago
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆23Nov 25, 2025Updated 3 months ago
- Detail-Oriented CLIP for Fine-Grained Tasks (ICLR SSI-FM 2025)☆57Mar 26, 2025Updated 11 months ago
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆24Aug 13, 2024Updated last year
- ☆27Jun 4, 2024Updated last year
- [IJCV 2026] HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts☆26Feb 28, 2025Updated last year
- ☆24Sep 12, 2023Updated 2 years ago
- PyTorch implementation of ``Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation'' [The Visual Computer…☆25Jan 7, 2025Updated last year
- Awesome Vision-Language Compositionality, a comprehensive curation of research papers in literature.☆34Feb 13, 2025Updated last year
- Composed Video Retrieval☆62May 2, 2024Updated last year
- ☆39May 20, 2025Updated 9 months ago
- Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"☆33Mar 15, 2024Updated last year
- FineCLIP: Self-distilled Region-based CLIP for Better Fine-grained Understanding (NIPS24)☆34Nov 12, 2025Updated 3 months ago
- The code for paper entitled "Data-Driven Modulation Optimization with LMMSE Equalization for Reliability Enhancement in Underwater Acoust…☆19Oct 4, 2025Updated 4 months ago
- Methods for using OpenFace in R☆11Feb 26, 2024Updated 2 years ago
- FFNet: MetaMixer-based Efficient Convolutional Mixer Design☆31Mar 11, 2025Updated 11 months ago
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆46Updated this week
- [CVPR'24] Validation-free few-shot adaptation of CLIP, using a well-initialized Linear Probe (ZSLP) and class-adaptive constraints (CLAP)…☆81Jun 7, 2025Updated 8 months ago
- Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)☆34Sep 17, 2022Updated 3 years ago