The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
☆251Jan 22, 2025Updated last year
Alternatives and similar repositories for ml-veclip
Users that are interested in ml-veclip are comparing it to the libraries listed below
Sorting:
- ☆59Mar 14, 2024Updated last year
- Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models" ICLR 2024☆111Jun 11, 2024Updated last year
- Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models☆27Nov 29, 2023Updated 2 years ago
- Densely Captioned Images (DCI) dataset repository.☆197Jul 1, 2024Updated last year
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"☆893Aug 13, 2024Updated last year
- Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.☆102Mar 23, 2025Updated 11 months ago
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.☆1,402Aug 4, 2025Updated 7 months ago
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024☆1,815Nov 27, 2025Updated 3 months ago
- Load any clip model with a standardized interface☆22Oct 20, 2025Updated 4 months ago
- [NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"☆289Jan 14, 2024Updated 2 years ago
- [ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆149Jun 13, 2024Updated last year
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆213Feb 27, 2024Updated 2 years ago
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.☆953Mar 19, 2025Updated 11 months ago
- This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025☆1,443Oct 9, 2025Updated 4 months ago
- Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training☆141Dec 16, 2025Updated 2 months ago
- Tool for exporting Apple Neural Engine-accelerated versions of transformers models on HuggingFace Hub.☆13May 2, 2023Updated 2 years ago
- 4M: Massively Multimodal Masked Modeling☆1,788Jun 2, 2025Updated 9 months ago
- DataComp: In search of the next generation of multimodal datasets☆772Apr 28, 2025Updated 10 months ago
- When do we not need larger vision models?☆413Feb 8, 2025Updated last year
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆211Jun 9, 2024Updated last year
- Code for T-MARS data filtering☆35Aug 23, 2023Updated 2 years ago
- ☆39Apr 27, 2024Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- Tune-Mode ConvBN Blocks For Efficient Transfer Learning☆18Aug 1, 2023Updated 2 years ago
- Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models☆48Sep 25, 2023Updated 2 years ago
- LLM2CLIP significantly improves already state-of-the-art CLIP models.☆630Feb 1, 2026Updated last month
- Grounded Language-Image Pre-training☆2,575Jan 24, 2024Updated 2 years ago
- An open source implementation of CLIP.☆13,430Updated this week
- [NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fi…☆86Oct 29, 2023Updated 2 years ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024 Best Paper]☆240Jan 3, 2026Updated 2 months ago
- Contains Colab Notebooks show cool use-cases of different GCP ML APIs.☆10Nov 5, 2020Updated 5 years ago
- ☆10Jul 5, 2024Updated last year
- Official Repository of ChatCaptioner☆469Apr 13, 2023Updated 2 years ago
- [NeurIPS 2022] Official code for "Focal Modulation Networks"☆751Nov 7, 2023Updated 2 years ago
- SuperStyleNet: Deep Image Synthesis with Superpixel Based Style Encoder (BMVC 2021)☆27Dec 28, 2021Updated 4 years ago
- COYO-700M: Large-scale Image-Text Pair Dataset☆1,252Nov 30, 2022Updated 3 years ago
- A huge dataset for Document Visual Question Answering☆20Jul 29, 2024Updated last year
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"☆319Jun 3, 2024Updated last year
- ☆91Jan 4, 2024Updated 2 years ago