facebookresearch / MetaCLIPLinks

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering

☆1,478

Alternatives and similar repositories for MetaCLIP

Users that are interested in MetaCLIP are comparing it to the libraries listed below

Sorting:

apple / ml-aim
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
☆1,331Updated 2 months ago
LAION-AI / CLIP_benchmark
CLIP-like model evaluation
☆740Updated last month
mlfoundations / datacomp
DataComp: In search of the next generation of multimodal datasets
☆724Updated 2 months ago
facebookresearch / hiera
Hiera: A fast, powerful, and simple hierarchical vision transformer.
☆1,004Updated last year
microsoft / X-Decoder
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
☆1,323Updated last year
mbzuai-oryx / groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆894Updated last month
baaivision / Emu
Emu Series: Generative Multimodal Models from BAAI
☆1,736Updated 9 months ago
mlfoundations / wise-ft
Robust fine-tuning of zero-shot models
☆722Updated 3 years ago
lucidrains / flamingo-pytorch
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
☆1,249Updated 2 years ago
OpenGVLab / VisionLLM
VisionLLM Series
☆1,089Updated 4 months ago
OFA-Sys / ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model To…
☆1,046Updated 9 months ago
google-research / big_vision
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
☆3,016Updated 2 months ago
penghao-wu / vstar
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
☆645Updated last year
allenai / unified-io-2
☆616Updated last year
facebookresearch / ToMe
A method to increase the speed and lower the memory footprint of existing vision transformers.
☆1,075Updated last year
google-research / pix2seq
Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
☆917Updated last year
LLaVA-VL / LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
☆751Updated last year
Computer-Vision-in-the-Wild / CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
☆1,319Updated last year
cambrian-mllm / cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆1,925Updated 8 months ago
microsoft / GLIP
Grounded Language-Image Pre-training
☆2,464Updated last year
FoundationVision / LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
☆1,813Updated 11 months ago
bfshi / scaling_on_scales
When do we not need larger vision models?
☆403Updated 5 months ago
PKU-YuanGroup / LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
☆822Updated last year
apple / ml-4m
4M: Massively Multimodal Masked Modeling
☆1,750Updated last month
allenai / visprog
Official code for VisProg (CVPR 2023 Best Paper!)
☆736Updated 10 months ago
NVlabs / RADIO
Official repository for "AM-RADIO: Reduce All Domains Into One"
☆1,256Updated 2 weeks ago
baaivision / EVA
EVA Series: Visual Representation Fantasies from BAAI
☆2,539Updated 11 months ago
LLaVA-VL / LLaVA-Interactive-Demo
LLaVA-Interactive-Demo
☆375Updated 11 months ago
allenai / mmc4
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
☆933Updated 4 months ago
yzhuoning / Awesome-CLIP
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
☆1,208Updated last year