mlfoundations / datacompLinks
DataComp: In search of the next generation of multimodal datasets
β719Updated last month
Alternatives and similar repositories for datacomp
Users that are interested in datacomp are comparing it to the libraries listed below
Sorting:
- CLIP-like model evaluationβ726Updated last week
- π§ Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".β482Updated last year
- β613Updated last year
- ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expertβ¦β1,460Updated 3 months ago
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β456Updated last year
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"β314Updated last year
- Robust fine-tuning of zero-shot modelsβ717Updated 3 years ago
- [COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Compositionβ640Updated 11 months ago
- Implementation of 𦩠Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorchβ1,248Updated 2 years ago
- When do we not need larger vision models?β395Updated 4 months ago
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.β932Updated 3 months ago
- Open reproduction of MUSE for fast text2image generation.β351Updated last year
- Research Trends in LLM-guided Multimodal Learning.β358Updated last year
- An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal β¦β361Updated last year
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M dβ¦β202Updated 9 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β343Updated 5 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).β614Updated 9 months ago
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"β520Updated last year
- GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interestβ533Updated 3 weeks ago
- This is the official repository for the LENS (Large Language Models Enhanced to See) system.β351Updated last year
- Learning from synthetic data - code and modelsβ318Updated last year
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β889Updated 2 weeks ago
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imagβ¦β528Updated last year
- TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.β1,613Updated last week
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentβ584Updated 8 months ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)β303Updated 5 months ago
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"β244Updated 5 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Modelsβ254Updated 6 months ago
- Large-scale text-video dataset. 10 million captioned short videos.β642Updated 10 months ago
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksβ385Updated 11 months ago