kohjingyu / fromageLinks
š§ Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".
ā482Updated last year
Alternatives and similar repositories for fromage
Users that are interested in fromage are comparing it to the libraries listed below
Sorting:
- DataComp: In search of the next generation of multimodal datasetsā731Updated 3 months ago
- š Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".ā460Updated last year
- This is the official repository for the LENS (Large Language Models Enhanced to See) system.ā352Updated 2 weeks ago
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"ā316Updated last year
- Official Repository of ChatCaptionerā464Updated 2 years ago
- Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for trainingā167Updated 2 years ago
- Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.ā394Updated 3 weeks ago
- ā227Updated last year
- Code release for "Learning Video Representations from Large Language Models"ā529Updated last year
- CLIP-like model evaluationā748Updated 2 weeks ago
- ā621Updated last year
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M dā¦ā206Updated 11 months ago
- GIT: A Generative Image-to-text Transformer for Vision and Languageā572Updated last year
- Implementation of 𦩠Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorchā1,256Updated 2 years ago
- An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal ā¦ā363Updated last year
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"ā269Updated last year
- Open reproduction of MUSE for fast text2image generation.ā354Updated last year
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.ā936Updated 4 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.ā346Updated 6 months ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)ā307Updated 6 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).ā619Updated 10 months ago
- Research Trends in LLM-guided Multimodal Learning.ā357Updated last year
- Language Models Can See: Plugging Visual Controls in Text Generationā258Updated 3 years ago
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"ā521Updated last year
- Research code for pixel-based encoders of language (PIXEL)ā338Updated 3 weeks ago
- Large-scale text-video dataset. 10 million captioned short videos.ā651Updated 11 months ago
- Multi-modality pre-trainingā501Updated last year
- Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR ā¦ā282Updated 2 years ago
- GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)ā330Updated last year
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKUā352Updated last year