Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training
β170Apr 27, 2023Updated 2 years ago
Alternatives and similar repositories for flamingo-mini
Users that are interested in flamingo-mini are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of 𦩠Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorchβ1,268Oct 18, 2022Updated 3 years ago
- An open-source framework for training large multimodal models.β4,084Aug 31, 2024Updated last year
- β11Nov 21, 2024Updated last year
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.β954Mar 19, 2025Updated last year
- Using pretrained encoder and language models to generate captions from multimedia inputs.β100Mar 11, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official repository for "Embodied Agents Meet Personalization: Investigating Challenges and Solutions Through the Lens of Memory Utilizatβ¦β20Oct 24, 2025Updated 5 months ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISIONβ36Nov 7, 2022Updated 3 years ago
- π§ Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".β485Oct 30, 2023Updated 2 years ago
- β12Mar 14, 2023Updated 3 years ago
- This repository contains code and dataset splits for the paper "Classification by Attention: Scene Graph Classification with Prior Knowleβ¦β16May 27, 2022Updated 3 years ago
- DataComp: In search of the next generation of multimodal datasetsβ774Apr 28, 2025Updated 11 months ago
- Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answerβ¦β56Oct 30, 2024Updated last year
- Official implementation of SEED-LLaMA (ICLR 2024).β641Sep 21, 2024Updated last year
- A reimplementation of KOSMOS-1 from "Language Is Not All You Need: Aligning Perception with Language Models"β27Mar 3, 2023Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 𦦠Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing impβ¦β3,360Mar 5, 2024Updated 2 years ago
- β134Dec 22, 2023Updated 2 years ago
- Chain of Images for Intuitively Reasoningβ10Nov 29, 2023Updated 2 years ago
- This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described β¦β71Dec 20, 2021Updated 4 years ago
- Official code for the ICLR2023 paper Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detectionβ43Jun 4, 2024Updated last year
- Deep Learning for Video Retrieval by Natural Languageβ11Oct 20, 2019Updated 6 years ago
- [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parametersβ5,925Mar 14, 2024Updated 2 years ago
- β201May 10, 2023Updated 2 years ago
- GIT: A Generative Image-to-text Transformer for Vision and Languageβ578Dec 2, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- COYO-700M: Large-scale Image-Text Pair Datasetβ1,252Nov 30, 2022Updated 3 years ago
- [CVPR 2023] Learning Visual Representations via Language-Guided Samplingβ151Apr 13, 2023Updated 3 years ago
- [ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videosβ127Sep 29, 2023Updated 2 years ago
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Modelsβ156Apr 30, 2024Updated last year
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M dβ¦β212Aug 28, 2024Updated last year
- A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!β138Dec 31, 2023Updated 2 years ago
- An open source implementation of CLIP.β13,658Apr 6, 2026Updated last week
- Teacher - student distillation using DeepSpeedβ19Oct 7, 2022Updated 3 years ago
- Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Lβ¦β2,558Apr 24, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specificβ¦β83Sep 13, 2024Updated last year
- β24Jun 18, 2025Updated 9 months ago
- β231Dec 18, 2023Updated 2 years ago
- [ICML2023] Instant Soup Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models. Ajay Jaiswal, Shiwei Liu, Tiβ¦β11Nov 28, 2023Updated 2 years ago
- aigc evalsβ10Dec 2, 2023Updated 2 years ago
- β20May 30, 2024Updated last year
- Recent Advances in Vision and Language Pre-training (VLP)β297Jun 6, 2023Updated 2 years ago