An Enhanced CLIP Framework for Learning with Synthetic Captions
☆40Apr 18, 2025Updated 10 months ago
Alternatives and similar repositories for CLIPS
Users that are interested in CLIPS are comparing it to the libraries listed below
Sorting:
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆33Oct 12, 2024Updated last year
- [CVPR 2024] This repository includes the official implementation our paper "Revisiting Adversarial Training at Scale"☆20Apr 21, 2024Updated last year
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆46Dec 1, 2024Updated last year
- [TMLR'24] This repository includes the official implementation our paper "FedConv: Enhancing Convolutional Neural Networks for Handling D…☆25Apr 30, 2024Updated last year
- A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes (WACV 2025)☆11Aug 11, 2025Updated 6 months ago
- An official PyTorch implementation for CLIPPR☆30Jul 22, 2023Updated 2 years ago
- [ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆149Jun 13, 2024Updated last year
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆28Oct 28, 2024Updated last year
- [CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"☆52Jun 16, 2025Updated 8 months ago
- [ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data☆14Sep 30, 2023Updated 2 years ago
- ☆11Oct 2, 2024Updated last year
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"☆86Nov 28, 2023Updated 2 years ago
- Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!☆11May 24, 2023Updated 2 years ago
- ☆15Nov 30, 2023Updated 2 years ago
- This repository contains the code for our CVPR 2024 paper,☆15Aug 27, 2024Updated last year
- [TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆149Oct 10, 2025Updated 4 months ago
- ☆11Oct 8, 2022Updated 3 years ago
- [TACL] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Nov 22, 2024Updated last year
- [NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation☆14Oct 7, 2023Updated 2 years ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆31May 16, 2024Updated last year
- ☆18Sep 23, 2024Updated last year
- [COLING'25] HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding☆44Nov 30, 2024Updated last year
- CLAIR: A (surprisingly) simple semantic text metric with large language models.☆21Jan 28, 2024Updated 2 years ago
- Using LLMs and pre-trained caption models for super-human performance on image captioning.☆42Oct 13, 2023Updated 2 years ago
- [ECCV 2022] This repository includes the official implementation our paper "In Defense of Image Pre-Training for Spatiotemporal Recogniti…☆19Dec 22, 2022Updated 3 years ago
- On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning, …☆19Dec 16, 2024Updated last year
- Visual and Embodied Concepts evaluation benchmark☆21Oct 10, 2023Updated 2 years ago
- AlignCLIP: Improving Cross-Modal Alignment in CLIP (ICLR 2025)☆59Mar 1, 2025Updated last year
- Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"☆55Mar 28, 2024Updated last year
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models☆21May 28, 2024Updated last year
- CirrMRI600+: Large Scale MRI Collection and Segmentation of Cirrhotic Liver☆23May 7, 2025Updated 9 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆80Oct 25, 2024Updated last year
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"☆319Jun 3, 2024Updated last year
- CLIP-MoE: Mixture of Experts for CLIP☆55Oct 10, 2024Updated last year
- [ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"☆21Mar 26, 2025Updated 11 months ago
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆20Sep 15, 2023Updated 2 years ago
- Code release for "Understanding Bias in Large-Scale Visual Datasets"☆22Dec 4, 2024Updated last year
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆50Jan 14, 2025Updated last year
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆204Jul 17, 2025Updated 7 months ago