Code release for "Improved baselines for vision-language pre-training"
☆62May 6, 2024Updated last year
Alternatives and similar repositories for clip-rocket
Users that are interested in clip-rocket are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year
- In this codebase we establish a benchmark for egocentric user adaptation based on Ego4d.First, we start from a population model which ha…☆15Jan 16, 2025Updated last year
- Project for SNARE benchmark☆11Jun 5, 2024Updated last year
- [ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data☆13Sep 30, 2023Updated 2 years ago
- Removing Cost Volumes from Optical Flow Estimators (ICCV 2025 Oral)☆37Dec 2, 2025Updated 4 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Code for “Pretrained Language Models as Visual Planners for Human Assistance”☆62Jun 12, 2023Updated 2 years ago
- An essential implementation of BYOL in PyTorch + PyTorch Lightning☆51Jul 15, 2021Updated 4 years ago
- Experiments for "A Closer Look at In-Context Learning under Distribution Shifts"☆19May 29, 2023Updated 2 years ago
- Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?☆18Jun 3, 2025Updated 10 months ago
- Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!☆11May 24, 2023Updated 2 years ago
- ☆10Jul 5, 2024Updated last year
- ☆62Jun 16, 2023Updated 2 years ago
- Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models☆28Nov 29, 2023Updated 2 years ago
- CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation☆82Aug 15, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- VisualGPTScore for visio-linguistic reasoning☆27Oct 7, 2023Updated 2 years ago
- Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆19Oct 4, 2022Updated 3 years ago
- This is the repository for "SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Recognition"☆16Oct 8, 2024Updated last year
- ☆20Apr 23, 2024Updated last year
- Official Implementation of "CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning" on MIC…☆18Feb 12, 2025Updated last year
- Code for ICCV 2023 paper ✨ "StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Mo…☆18Jan 25, 2024Updated 2 years ago
- Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining, WACV 2024☆13Jan 3, 2024Updated 2 years ago
- The offical Pytorch code for "Continual Attentive Fusion for Incremental Learning in Semantic Segmentation"☆16Apr 8, 2022Updated 4 years ago
- If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions☆17Apr 4, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [TACL/EMNLP'24] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Nov 22, 2024Updated last year
- PyTorch implementation of "Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning" with DDP and Apex AMP☆84Sep 16, 2020Updated 5 years ago
- The code release for "Variational Structured Attention Networks for Visual Dense Representation Learning"☆14Nov 28, 2022Updated 3 years ago
- ☆12Sep 6, 2023Updated 2 years ago
- Code and model definition for converting the official BYOL weights to PyTorch☆33Nov 23, 2021Updated 4 years ago
- Adversarial Robustness on In- and Out-Distribution Improves Explainability☆12Feb 10, 2022Updated 4 years ago
- ☆13Mar 25, 2021Updated 5 years ago
- ☆17Dec 13, 2023Updated 2 years ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- 📍 Official repository of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS 2023)☆55Nov 8, 2023Updated 2 years ago
- SIEVE: Multimodal Dataset Pruning using Image-Captioning Models (CVPR 2024)☆19Apr 28, 2024Updated last year
- Code release for "Semi-supervised learning made simple with self-supervised clustering"☆62Jun 13, 2023Updated 2 years ago
- This is the repository for TimelineQA, a benchmark for querying lifelogs.☆26Jul 5, 2023Updated 2 years ago
- A list of (detailed, non-stochastic) action potential models, with links to papers, source code, CellML and Myokit implementations☆12Mar 16, 2026Updated 3 weeks ago
- Pytorch code for "Improving Self-Supervised Learning by Characterizing Idealized Representations"☆42Nov 27, 2022Updated 3 years ago
- ☆37Oct 7, 2023Updated 2 years ago