kohjingyu / fromageLinks
š§ Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".
ā482Updated last year
Alternatives and similar repositories for fromage
Users that are interested in fromage are comparing it to the libraries listed below
Sorting:
- š Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".ā459Updated last year
- DataComp: In search of the next generation of multimodal datasetsā724Updated 2 months ago
- Official Repository of ChatCaptionerā464Updated 2 years ago
- This is the official repository for the LENS (Large Language Models Enhanced to See) system.ā352Updated last year
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"ā315Updated last year
- Code release for "Learning Video Representations from Large Language Models"ā526Updated last year
- Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for trainingā167Updated 2 years ago
- ā616Updated last year
- GIT: A Generative Image-to-text Transformer for Vision and Languageā572Updated last year
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.ā345Updated 6 months ago
- CLIP-like model evaluationā740Updated last month
- ā228Updated last year
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.ā933Updated 4 months ago
- Implementation of 𦩠Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorchā1,249Updated 2 years ago
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"ā521Updated last year
- An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal ā¦ā362Updated last year
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)ā305Updated 6 months ago
- Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.ā396Updated last week
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M dā¦ā206Updated 10 months ago
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"ā267Updated last year
- Research Trends in LLM-guided Multimodal Learning.ā358Updated last year
- Language Models Can See: Plugging Visual Controls in Text Generationā257Updated 3 years ago
- Multi-modality pre-trainingā496Updated last year
- Official implementation of SEED-LLaMA (ICLR 2024).ā618Updated 10 months ago
- Open LLaMA Eyes to See the Worldā174Updated 2 years ago
- Easily create large video dataset from video urlsā617Updated 11 months ago
- Densely Captioned Images (DCI) dataset repository.ā186Updated last year
- Open reproduction of MUSE for fast text2image generation.ā354Updated last year
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKUā352Updated last year
- Large-scale text-video dataset. 10 million captioned short videos.ā646Updated 11 months ago