dhansmair / flamingo-miniView external linksLinks
Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training
☆169Apr 27, 2023Updated 2 years ago
Alternatives and similar repositories for flamingo-mini
Users that are interested in flamingo-mini are comparing it to the libraries listed below
Sorting:
- An open-source framework for training large multimodal models.☆4,068Aug 31, 2024Updated last year
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.☆952Mar 19, 2025Updated 10 months ago
- 🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".☆486Oct 30, 2023Updated 2 years ago
- Deep Learning for Video Retrieval by Natural Language☆11Oct 20, 2019Updated 6 years ago
- Using pretrained encoder and language models to generate captions from multimedia inputs.☆100Mar 11, 2023Updated 2 years ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Nov 7, 2022Updated 3 years ago
- DataComp: In search of the next generation of multimodal datasets☆770Apr 28, 2025Updated 9 months ago
- ☆200May 10, 2023Updated 2 years ago
- This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described …☆71Dec 20, 2021Updated 4 years ago
- This repo contains the code and configuration files for reproducing object detection results of FocalNets with DINO☆68Mar 10, 2023Updated 2 years ago
- GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)☆340Jan 8, 2024Updated 2 years ago
- GIT: A Generative Image-to-text Transformer for Vision and Language☆580Dec 2, 2023Updated 2 years ago
- 🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing imp…☆3,292Mar 5, 2024Updated last year
- ☆133Dec 22, 2023Updated 2 years ago
- Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answer…☆55Oct 30, 2024Updated last year
- [CVPR 2023] Learning Visual Representations via Language-Guided Sampling☆149Apr 13, 2023Updated 2 years ago
- Code for ICCV2021: Discovering Human Interactions with Large-Vocabulary Objects via Query and Multi-Scale Detection☆28Oct 12, 2021Updated 4 years ago
- A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)☆62May 7, 2024Updated last year
- COYO-700M: Large-scale Image-Text Pair Dataset☆1,251Nov 30, 2022Updated 3 years ago
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models☆155Apr 30, 2024Updated last year
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for E…☆544May 19, 2025Updated 8 months ago
- [ECCV 2022] FashionViL: Fashion-Focused V+L Representation Learning☆61Nov 15, 2022Updated 3 years ago
- [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters☆5,936Mar 14, 2024Updated last year
- Official code for the ICLR2023 paper Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection☆43Jun 4, 2024Updated last year
- The test set for Koala☆45Mar 31, 2023Updated 2 years ago
- [ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos☆126Sep 29, 2023Updated 2 years ago
- baseline mode for the ObjectNet competition☆18Jan 13, 2021Updated 5 years ago
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…☆211Aug 28, 2024Updated last year
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆102Sep 11, 2024Updated last year
- An open source implementation of CLIP.☆13,383Updated this week
- SVIT: Scaling up Visual Instruction Tuning☆166Jun 20, 2024Updated last year
- A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!☆137Dec 31, 2023Updated 2 years ago
- Course repository for the Spring 2023 COMP664 course "Deep Learning" at UNC☆14Apr 17, 2023Updated 2 years ago
- Lane segmentation model trained with tensorflow implementation MobileNetV2 based U-Net☆11Mar 24, 2023Updated 2 years ago
- A PyTorch implementation of Proxy Anchor Loss based on CVPR 2020 paper "Proxy Anchor Loss for Deep Metric Learning"☆11Jan 16, 2021Updated 5 years ago
- [ICML2023] Instant Soup Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models. Ajay Jaiswal, Shiwei Liu, Ti…☆11Nov 28, 2023Updated 2 years ago
- A collection of papers tackling automatic fact-checking (particularly of AI-generated content)☆14Nov 3, 2023Updated 2 years ago
- Aligning LMMs with Factually Augmented RLHF☆392Nov 1, 2023Updated 2 years ago
- Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence L…☆2,555Apr 24, 2024Updated last year