Implementation of 𦩠Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
β1,272Oct 18, 2022Updated 3 years ago
Alternatives and similar repositories for flamingo-pytorch
Users that are interested in flamingo-pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An open-source framework for training large multimodal models.β4,083Aug 31, 2024Updated last year
- Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for trainingβ169Apr 27, 2023Updated 2 years ago
- Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Lβ¦β2,557Apr 24, 2024Updated last year
- LAVIS - A One-stop Library for Language-Vision Intelligenceβ11,194Nov 18, 2024Updated last year
- PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generationβ5,699Mar 3, 2026Updated 3 weeks ago
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Grounded Language-Image Pre-trainingβ2,585Jan 24, 2024Updated 2 years ago
- An open source implementation of CLIP.β13,579Mar 12, 2026Updated 2 weeks ago
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.β954Mar 19, 2025Updated last year
- Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorchβ1,200Dec 12, 2023Updated 2 years ago
- COYO-700M: Large-scale Image-Text Pair Datasetβ1,251Nov 30, 2022Updated 3 years ago
- [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parametersβ5,932Mar 14, 2024Updated 2 years ago
- Code for ALBEF: a new vision-language pre-training methodβ1,757Sep 20, 2022Updated 3 years ago
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ22,059Jan 23, 2026Updated 2 months ago
- 𦦠Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing impβ¦β3,353Mar 5, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Code release for SLIP Self-supervision meets Language-Image Pre-trainingβ787Feb 9, 2023Updated 3 years ago
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.β24,603Aug 12, 2024Updated last year
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorchβ879Oct 30, 2023Updated 2 years ago
- Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorchβ11,324May 11, 2024Updated last year
- π§ Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".β485Oct 30, 2023Updated 2 years ago
- A concise but complete implementation of CLIP with various experimental improvements from recent papersβ722Oct 16, 2023Updated 2 years ago
- CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an imageβ32,946Feb 18, 2026Updated last month
- TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.β1,710Updated this week
- Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.β4,396Oct 19, 2025Updated 5 months ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)β374Jul 29, 2023Updated 2 years ago
- Repo for external large-scale workβ6,542Apr 27, 2024Updated last year
- Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathwaysβ828Nov 9, 2022Updated 3 years ago
- METER: A Multimodal End-to-end TransformER Frameworkβ377Nov 16, 2022Updated 3 years ago
- mPLUG-Owl: The Powerful Multi-modal Large Language Model Familyβ2,540Apr 2, 2025Updated 11 months ago
- Oscar and VinVLβ1,052Aug 28, 2023Updated 2 years ago
- Latest Advances on Multimodal Large Language Modelsβ17,505Mar 20, 2026Updated last week
- DataComp: In search of the next generation of multimodal datasets