kohjingyu / fromage
π§ Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".
β478Updated last year
Alternatives and similar repositories for fromage:
Users that are interested in fromage are comparing it to the libraries listed below
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β448Updated last year
- DataComp: In search of the next generation of multimodal datasetsβ684Updated last year
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"β311Updated 9 months ago
- Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for trainingβ166Updated last year
- Implementation of 𦩠Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorchβ1,234Updated 2 years ago
- Official Repository of ChatCaptionerβ462Updated last year
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M dβ¦β197Updated 6 months ago
- This is the official repository for the LENS (Large Language Models Enhanced to See) system.β352Updated last year
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.β918Updated 9 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β332Updated last month
- Code release for "Learning Video Representations from Large Language Models"β510Updated last year
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)β287Updated last month
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"β265Updated 9 months ago
- Language Models Can See: Plugging Visual Controls in Text Generationβ257Updated 2 years ago
- β225Updated last year
- CLIP-like model evaluationβ671Updated 3 weeks ago
- Open reproduction of MUSE for fast text2image generation.β347Updated 9 months ago
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β475Updated 7 months ago
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"β515Updated last year
- An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal β¦β361Updated last year
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imagβ¦β504Updated 10 months ago
- [COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Compositionβ617Updated 7 months ago
- Internet Explorer explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desiβ¦β163Updated 2 years ago
- Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"β446Updated last year
- Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.β384Updated 2 years ago
- A concise but complete implementation of CLIP with various experimental improvements from recent papersβ708Updated last year
- Densely Captioned Images (DCI) dataset repository.β171Updated 8 months ago
- Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR β¦β268Updated last year
- GIT: A Generative Image-to-text Transformer for Vision and Languageβ558Updated last year
- Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updatesβ446Updated 10 months ago