facebookresearch / MetaCLIP
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
β1,371Updated last week
Alternatives and similar repositories for MetaCLIP:
Users that are interested in MetaCLIP are comparing it to the libraries listed below
- DataComp: In search of the next generation of multimodal datasetsβ687Updated last year
- CLIP-like model evaluationβ677Updated last month
- Implementation of 𦩠Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorchβ1,235Updated 2 years ago
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β852Updated 4 months ago
- Hiera: A fast, powerful, and simple hierarchical vision transformer.β958Updated last year
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.β2,735Updated last week
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.β1,241Updated 4 months ago
- Robust fine-tuning of zero-shot modelsβ681Updated 2 years ago
- [CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and languageβ1,309Updated last year
- VisionLLM Seriesβ1,028Updated 3 weeks ago
- Emu Series: Generative Multimodal Models from BAAIβ1,695Updated 5 months ago
- β602Updated last year
- Autoregressive Model Beats Diffusion: π¦ Llama for Scalable Image Generationβ1,620Updated 7 months ago
- A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''β1,264Updated last year
- Easily compute clip embeddings and build a clip retrieval system with themβ2,517Updated 11 months ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skillsβ733Updated last year
- LLaVA-Interactive-Demoβ366Updated 7 months ago
- EVA Series: Visual Representation Fantasies from BAAIβ2,453Updated 7 months ago
- γICLR 2024π₯γ Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignmentβ795Updated 11 months ago
- Grounded Language-Image Pre-trainingβ2,356Updated last year
- [CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Wantβ792Updated 7 months ago
- A method to increase the speed and lower the memory footprint of existing vision transformers.β1,025Updated 9 months ago
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β574Updated last year
- 4M: Massively Multimodal Masked Modelingβ1,701Updated 2 weeks ago
- β772Updated 8 months ago
- π§ Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".β479Updated last year
- When do we not need larger vision models?β380Updated last month
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β448Updated last year
- This is the official repository for the LENS (Large Language Models Enhanced to See) system.β353Updated last year
- TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.β1,561Updated last week