dhansmair / flamingo-miniLinks

Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training

☆167

Alternatives and similar repositories for flamingo-mini

Users that are interested in flamingo-mini are comparing it to the libraries listed below

Sorting:

huggingface / OBELICS
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…
☆206Updated last year
microsoft / BridgeTower
Open source code for AAAI 2023 Paper "BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning"
☆166Updated 2 years ago
huggingface / m4-logs
M4 experiment logbook
☆57Updated 2 years ago
kohjingyu / fromage
🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".
☆482Updated last year
LAION-AI / General-GPT
☆65Updated 2 years ago
microsoft / FIBER
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
☆129Updated 2 years ago
feizc / Visual-LLaMA
Open LLaMA Eyes to See the World
☆174Updated 2 years ago
sanjayss34 / codevqa
☆84Updated 2 years ago
OFA-Sys / TouchStone
Touchstone: Evaluating Vision-Language Models by Language Models
☆83Updated last year
yxuansu / MAGIC
Language Models Can See: Plugging Visual Controls in Text Generation
☆259Updated 3 years ago
SALT-NLP / LLaVAR
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
☆268Updated last year
PLUM-Lab / MultiInstruct
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
☆134Updated 2 years ago
ZrrSkywalker / LLaMA-Adapter
Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
☆90Updated 2 years ago
gregor-ge / mBLIP
☆87Updated last year
LAION-AI / Big-Interleaved-Dataset
Big-Interleaved-Dataset
☆57Updated 2 years ago
facebookresearch / diht
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
☆138Updated 2 years ago
rowanz / merlot_reserve
Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"
☆144Updated 3 years ago
nttmdlab-nlp / SlideVQA
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)
☆98Updated 6 months ago
kyegomez / PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
☆145Updated 2 weeks ago
DavidHuji / CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
☆198Updated last year
kyegomez / PALI
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
☆91Updated last year
mshukor / UnIVAL
[TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.
☆231Updated last year
haoliuhl / language-quantized-autoencoders
Language Quantized AutoEncoders
☆110Updated 2 years ago
wade3han / champagne
An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"
☆52Updated 2 years ago
allenai / unified-io-inference
☆227Updated last year
bjoernpl / KOSMOS_reimplementation
A reimplementation of KOSMOS-1 from "Language Is Not All You Need: Aligning Perception with Language Models"
☆27Updated 2 years ago
ChenDelong1999 / polite-flamingo
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
☆63Updated last year
ylsung / VL_adapter
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
☆207Updated 2 years ago
TheoCoombes / ClipCap
Using pretrained encoder and language models to generate captions from multimedia inputs.
☆97Updated 2 years ago
mlfoundations / VisIT-Bench
☆50Updated last year