kohjingyu / fromageLinks
š§ Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".
ā482Updated last year
Alternatives and similar repositories for fromage
Users that are interested in fromage are comparing it to the libraries listed below
Sorting:
- š Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".ā465Updated last year
- DataComp: In search of the next generation of multimodal datasetsā745Updated 6 months ago
- This is the official repository for the LENS (Large Language Models Enhanced to See) system.ā354Updated 3 months ago
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"ā318Updated last year
- Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for trainingā167Updated 2 years ago
- Official Repository of ChatCaptionerā466Updated 2 years ago
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M dā¦ā206Updated last year
- ā227Updated last year
- Code release for "Learning Video Representations from Large Language Models"ā537Updated 2 years ago
- ā628Updated last year
- Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.ā405Updated 3 months ago
- Implementation of 𦩠Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorchā1,266Updated 3 years ago
- GIT: A Generative Image-to-text Transformer for Vision and Languageā575Updated last year
- An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal ā¦ā361Updated last year
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.ā941Updated 7 months ago
- CLIP-like model evaluationā780Updated 2 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.ā352Updated 9 months ago
- Language Models Can See: Plugging Visual Controls in Text Generationā259Updated 3 years ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)ā310Updated 9 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).ā631Updated last year
- Research Trends in LLM-guided Multimodal Learning.ā355Updated 2 years ago
- Open reproduction of MUSE for fast text2image generation.ā355Updated last year
- Easily create large video dataset from video urlsā634Updated last year
- Get hundred of million of image+url from the crawling at home dataset and preprocess themā222Updated last year
- Large-scale text-video dataset. 10 million captioned short videos.ā658Updated last year
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"ā268Updated last year
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"ā522Updated last year
- Research code for pixel-based encoders of language (PIXEL)ā340Updated 3 months ago
- Densely Captioned Images (DCI) dataset repository.ā191Updated last year
- Multi-modality pre-trainingā503Updated last year