kohjingyu / fromageLinks

🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".

☆482

Alternatives and similar repositories for fromage

Users that are interested in fromage are comparing it to the libraries listed below

Sorting:

kohjingyu / gill
🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".
☆459Updated last year
mlfoundations / datacomp
DataComp: In search of the next generation of multimodal datasets
☆724Updated 2 months ago
Vision-CAIR / ChatCaptioner
Official Repository of ChatCaptioner
☆464Updated 2 years ago
ContextualAI / lens
This is the official repository for the LENS (Large Language Models Enhanced to See) system.
☆352Updated last year
UCSC-VLAA / CLIPA
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
☆315Updated last year
facebookresearch / LaViLa
Code release for "Learning Video Representations from Large Language Models"
☆526Updated last year
dhansmair / flamingo-mini
Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training
☆167Updated 2 years ago
allenai / unified-io-2
☆616Updated last year
microsoft / GenerativeImage2Text
GIT: A Generative Image-to-text Transformer for Vision and Language
☆572Updated last year
AILab-CVC / SEED-Bench
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
☆345Updated 6 months ago
LAION-AI / CLIP_benchmark
CLIP-like model evaluation
☆740Updated last month
allenai / unified-io-inference
☆228Updated last year
allenai / mmc4
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
☆933Updated 4 months ago
lucidrains / flamingo-pytorch
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
☆1,249Updated 2 years ago
luogen1996 / LaVIN
[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"
☆521Updated last year
kyegomez / CM3Leon
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal …
☆362Updated last year
yuweihao / MM-Vet
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
☆305Updated 6 months ago
google-research-datasets / conceptual-12m
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
☆396Updated last week
huggingface / OBELICS
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…
☆206Updated 10 months ago
SALT-NLP / LLaVAR
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
☆267Updated last year
HenryHZY / Awesome-Multimodal-LLM
Research Trends in LLM-guided Multimodal Learning.
☆358Updated last year
yxuansu / MAGIC
Language Models Can See: Plugging Visual Controls in Text Generation
☆257Updated 3 years ago
microsoft / XPretrain
Multi-modality pre-training
☆496Updated last year
AILab-CVC / SEED
Official implementation of SEED-LLaMA (ICLR 2024).
☆618Updated 10 months ago
feizc / Visual-LLaMA
Open LLaMA Eyes to See the World
☆174Updated 2 years ago
iejMac / video2dataset
Easily create large video dataset from video urls
☆617Updated 11 months ago
facebookresearch / DCI
Densely Captioned Images (DCI) dataset repository.
☆186Updated last year
huggingface / open-muse
Open reproduction of MUSE for fast text2image generation.
☆354Updated last year
HaozheZhao / MIC
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
☆352Updated last year
m-bain / webvid
Large-scale text-video dataset. 10 million captioned short videos.
☆646Updated 11 months ago