UCSC-VLAA/Recap-DataComp-1B

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/UCSC-VLAA/Recap-DataComp-1B)

UCSC-VLAA / Recap-DataComp-1B

[ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"

☆152

Alternatives and similar repositories for Recap-DataComp-1B

Users that are interested in Recap-DataComp-1B are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

UCSC-VLAA / CLIPS
View on GitHub
An Enhanced CLIP Framework for Learning with Synthetic Captions
☆40Apr 18, 2025Updated last year
UCSC-VLAA / Image-Pretraining-for-Video
View on GitHub
[ECCV 2022] This repository includes the official implementation our paper "In Defense of Image Pre-Training for Spatiotemporal Recogniti…
☆19Dec 22, 2022Updated 3 years ago
UCSC-VLAA / EarthWhere
View on GitHub
☆16Nov 15, 2025Updated 8 months ago
UCSC-VLAA / VLAA-GUI
View on GitHub
Official implementation of VLAA-GUI series
☆34Jun 20, 2026Updated last month
UCSC-VLAA / AdvXL
View on GitHub
[CVPR 2024] This repository includes the official implementation our paper "Revisiting Adversarial Training at Scale"
☆20Apr 21, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
UCSC-VLAA / HQ-Edit
View on GitHub
[ICLR 2025] HQ-Edit: A High-Quality and High-Coverage Dataset for General Image Editing
☆114Apr 18, 2024Updated 2 years ago
UCSC-VLAA / CLIPA
View on GitHub
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
☆321Jun 3, 2024Updated 2 years ago
ImKeTT / ZeroGen
View on GitHub
[NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation
☆14Oct 7, 2023Updated 2 years ago
UCSC-VLAA / MedVLSynther
View on GitHub
[ICLR'26] MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs
☆19Nov 1, 2025Updated 8 months ago
UCSC-VLAA / EVP
View on GitHub
[TMLR'24] This repository includes the official implementation our paper "Unleashing the Power of Visual Prompting At the Pixel Level"
☆42Apr 30, 2024Updated 2 years ago
tsb0601 / MMVP
View on GitHub
☆364Jan 27, 2024Updated 2 years ago
UCSC-VLAA / VLAA-Thinking
View on GitHub
[TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
☆148Oct 10, 2025Updated 9 months ago
haoyu-bu / CAFe
View on GitHub
Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"
☆33Mar 26, 2025Updated last year
ant-research / DreamLIP
View on GitHub
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆138May 8, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
UCSC-VLAA / Sight-Beyond-Text
View on GitHub
[TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
☆20Sep 15, 2023Updated 2 years ago
baaivision / DenseFusion
View on GitHub
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆159Dec 6, 2024Updated last year
UCSC-VLAA / AttnGCG-attack
View on GitHub
[TMLR 2025] Official implementation of AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
☆27Jun 17, 2025Updated last year
yuecao0119 / MMInstruct
View on GitHub
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆64Nov 7, 2024Updated last year
bfshi / scaling_on_scales
View on GitHub
When do we not need larger vision models?
☆420Feb 8, 2025Updated last year
ImKeTT / PCAE
View on GitHub
[KBS] PCAE: A Framework of Plug-in Conditional Auto-Encoder for Controllable Text Generation PyTorch Implementation
☆26Apr 10, 2023Updated 3 years ago
OpenGVLab / OmniCorpus
View on GitHub
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆425May 5, 2025Updated last year
Yangyi-Chen / SOLO
View on GitHub
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆150Nov 14, 2024Updated last year
tripletclip / TripletCLIP
View on GitHub
[NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"
☆48Dec 1, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
UCSC-VLAA / FedConv
View on GitHub
[TMLR'24] This repository includes the official implementation our paper "FedConv: Enhancing Convolutional Neural Networks for Handling D…
☆25Apr 30, 2024Updated 2 years ago
apple / ml-veclip
View on GitHub
The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
☆253Jan 22, 2025Updated last year
ImKeTT / FET-LM
View on GitHub
[TNNLS, to appear] FET-LM: Flow Enhanced Variational Auto-Encoder for Topic-Guided Language Modeling PyTorch Implementation
☆14Mar 4, 2023Updated 3 years ago
UCSC-VLAA / ReasoningEval
View on GitHub
Official repo of Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains.
☆43Jun 6, 2025Updated last year
MCG-NJU / RGE
View on GitHub
Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval
☆15Nov 29, 2025Updated 7 months ago
CVMI-Lab / clip-beyond-tail
View on GitHub
(NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights
☆27Oct 28, 2024Updated last year
tiiuae / FineLIP
View on GitHub
code for FineLIP
☆43Nov 25, 2025Updated 7 months ago
zeyofu / BLINK_Benchmark
View on GitHub
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆171Sep 27, 2025Updated 9 months ago
zeyofu / Commonsense-T2I
View on GitHub
Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]
☆24Aug 13, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
microsoft / x-reasoner
View on GitHub
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
☆49Feb 4, 2026Updated 5 months ago
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,008Nov 7, 2025Updated 8 months ago
mlfoundations / MINT-1T
View on GitHub
🍃 MINT-1T: A one trillion token multimodal interleaved dataset.
☆833Jul 31, 2024Updated last year
facebookresearch / DCI
View on GitHub
Densely Captioned Images (DCI) dataset repository.
☆197Jul 1, 2024Updated 2 years ago
ethanhe42 / dds
View on GitHub
DDS: Delta Denoising Score PyTorch implementation
☆19Sep 2, 2023Updated 2 years ago
mu-cai / matryoshka-mm
View on GitHub
Matryoshka Multimodal Models
☆123Jan 22, 2025Updated last year
hammoudhasan / SynthCLIP
View on GitHub
Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.
☆104Mar 23, 2025Updated last year