Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training
β169Apr 27, 2023Updated 2 years ago
Alternatives and similar repositories for flamingo-mini
Users that are interested in flamingo-mini are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of 𦩠Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorchβ1,272Oct 18, 2022Updated 3 years ago
- An open-source framework for training large multimodal models.β4,079Aug 31, 2024Updated last year
- β11Nov 21, 2024Updated last year
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.β954Mar 19, 2025Updated last year
- Using pretrained encoder and language models to generate captions from multimedia inputs.β100Mar 11, 2023Updated 3 years ago
- DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Official repository for "Embodied Agents Meet Personalization: Investigating Challenges and Solutions Through the Lens of Memory Utilizatβ¦β20Oct 24, 2025Updated 5 months ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISIONβ36Nov 7, 2022Updated 3 years ago
- π§ Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".β485Oct 30, 2023Updated 2 years ago
- β12Mar 14, 2023Updated 3 years ago
- SVIT: Scaling up Visual Instruction Tuningβ166Jun 20, 2024Updated last year
- β12Feb 11, 2026Updated last month
- DataComp: In search of the next generation of multimodal datasetsβ773Apr 28, 2025Updated 10 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).β642Sep 21, 2024Updated last year
- Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answerβ¦β56Oct 30, 2024Updated last year
- DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A reimplementation of KOSMOS-1 from "Language Is Not All You Need: Aligning Perception with Language Models"β27Mar 3, 2023Updated 3 years ago
- 𦦠Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing impβ¦β3,344Mar 5, 2024Updated 2 years ago
- β134Dec 22, 2023Updated 2 years ago
- Show the time in Roman Numeralsβ11Jan 23, 2020Updated 6 years ago
- Chain of Images for Intuitively Reasoningβ10Nov 29, 2023Updated 2 years ago
- β26Jun 5, 2023Updated 2 years ago
- This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described β¦β71Dec 20, 2021Updated 4 years ago
- Deep Learning for Video Retrieval by Natural Languageβ11Oct 20, 2019Updated 6 years ago
- [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parametersβ5,932Mar 14, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)β341Jan 8, 2024Updated 2 years ago
- β199May 10, 2023Updated 2 years ago
- GIT: A Generative Image-to-text Transformer for Vision and Languageβ579Dec 2, 2023Updated 2 years ago
- COYO-700M: Large-scale Image-Text Pair Datasetβ1,251Nov 30, 2022Updated 3 years ago
- [CVPR 2023] Learning Visual Representations via Language-Guided Samplingβ150Apr 13, 2023Updated 2 years ago
- Perf monitoring CLI tool for Apple Siliconβ10Jan 25, 2023Updated 3 years ago
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M dβ¦β211Aug 28, 2024Updated last year
- [ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videosβ127Sep 29, 2023Updated 2 years ago
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Modelsβ155Apr 30, 2024Updated last year
- End-to-end encrypted email - Proton Mail β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!β138Dec 31, 2023Updated 2 years ago
- An open source implementation of CLIP.β13,579Mar 12, 2026Updated 2 weeks ago
- Teacher - student distillation using DeepSpeedβ19Oct 7, 2022Updated 3 years ago
- Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Lβ¦β2,557Apr 24, 2024Updated last year
- [ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specificβ¦β81Sep 13, 2024Updated last year
- β24Jun 18, 2025Updated 9 months ago
- β231Dec 18, 2023Updated 2 years ago