mlfoundations / datacompLinks
DataComp: In search of the next generation of multimodal datasets
β710Updated last month
Alternatives and similar repositories for datacomp
Users that are interested in datacomp are comparing it to the libraries listed below
Sorting:
- CLIP-like model evaluationβ717Updated last week
- π§ Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".β482Updated last year
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"β314Updated last year
- ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expertβ¦β1,450Updated 2 months ago
- β613Updated last year
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β455Updated last year
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M dβ¦β202Updated 9 months ago
- An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal β¦β360Updated last year
- Robust fine-tuning of zero-shot modelsβ705Updated 3 years ago
- When do we not need larger vision models?β393Updated 3 months ago
- Implementation of 𦩠Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorchβ1,241Updated 2 years ago
- Official implementation of SEED-LLaMA (ICLR 2024).β613Updated 8 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β340Updated 4 months ago
- This is the official repository for the LENS (Large Language Models Enhanced to See) system.β350Updated last year
- Open reproduction of MUSE for fast text2image generation.β350Updated last year
- Easily create large video dataset from video urlsβ611Updated 10 months ago
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.β930Updated 2 months ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skillsβ743Updated last year
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)β301Updated 4 months ago
- [COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Compositionβ635Updated 10 months ago
- Official code for VisProg (CVPR 2023 Best Paper!)β725Updated 9 months ago
- Large-scale text-video dataset. 10 million captioned short videos.β639Updated 9 months ago
- Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.β391Updated 2 years ago
- GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interestβ528Updated 11 months ago
- Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for trainingβ167Updated 2 years ago
- Official Repository of ChatCaptionerβ463Updated 2 years ago
- LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformerβ376Updated last month
- Get hundred of million of image+url from the crawling at home dataset and preprocess themβ219Updated last year
- Research Trends in LLM-guided Multimodal Learning.β357Updated last year
- Learning from synthetic data - code and modelsβ315Updated last year