SforAiDl / CountCLIP
☆22Updated 10 months ago
Alternatives and similar repositories for CountCLIP:
Users that are interested in CountCLIP are comparing it to the libraries listed below
- ☆37Updated 9 months ago
- Official repository of paper "Subobject-level Image Tokenization"☆69Updated 3 weeks ago
- Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)☆27Updated 8 months ago
- Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).☆67Updated this week
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆34Updated last year
- [ICLR 2025] - Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion☆36Updated this week
- ☆50Updated last month
- [ICLR 2025] SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image and Video Generation☆36Updated 3 months ago
- official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"☆208Updated 5 months ago
- Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"☆27Updated last year
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆56Updated last year
- [CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations☆63Updated 3 weeks ago
- This repo contains the official implementation of ICLR 2024 paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long …☆87Updated 11 months ago
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆79Updated 6 months ago
- Code for the paper - ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning☆18Updated 8 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆71Updated 10 months ago
- ☆31Updated 3 months ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆27Updated 11 months ago
- Rare-to-Frequent (R2F), ICLR'25, Spotlight☆41Updated this week
- Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)☆118Updated last year
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆26Updated 3 months ago
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation☆42Updated 6 months ago
- [ICML 2024] On Discrete Prompt Optimization for Diffusion Models - Google☆53Updated 8 months ago
- Matryoshka Multimodal Models☆99Updated 3 months ago
- [CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"☆41Updated 2 months ago
- [CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval☆14Updated 3 weeks ago
- NegCLIP.☆31Updated 2 years ago
- [ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning☆52Updated 2 months ago
- [CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…☆39Updated 4 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆28Updated 6 months ago