dhansmair / flamingo-mini
Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training
☆165Updated last year
Related projects ⓘ
Alternatives and complementary repositories for flamingo-mini
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…☆189Updated 2 months ago
- 🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".☆478Updated last year
- MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning☆133Updated last year
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆266Updated 2 weeks ago
- Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters☆85Updated last year
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning☆257Updated 8 months ago
- Open source code for AAAI 2023 Paper "BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning"☆158Updated last year
- ☆83Updated last year
- [NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering☆178Updated 10 months ago
- Language Models Can See: Plugging Visual Controls in Text Generation☆254Updated 2 years ago
- PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)☆202Updated last year
- M4 experiment logbook☆56Updated last year
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"☆298Updated 5 months ago
- [TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.☆224Updated 11 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆334Updated 11 months ago
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation☆125Updated last year
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"☆258Updated 5 months ago
- Touchstone: Evaluating Vision-Language Models by Language Models☆78Updated 10 months ago
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆89Updated 2 months ago
- Self-Alignment with Principle-Following Reward Models☆147Updated 8 months ago
- Open LLaMA Eyes to See the World☆175Updated last year
- Recurrent Memory Transformer☆147Updated last year
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆110Updated last month
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.☆315Updated 4 months ago
- CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)☆186Updated 9 months ago
- The official site of paper MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation☆191Updated last year
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆86Updated 8 months ago
- Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)☆153Updated 11 months ago
- All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)☆142Updated 3 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆142Updated last week