apsdehal / flava-tutorials
Tutorials for FLAVA model https://arxiv.org/abs/2112.04482
☆12Updated 2 years ago
Alternatives and similar repositories for flava-tutorials:
Users that are interested in flava-tutorials are comparing it to the libraries listed below
- image captioning with flikr8k dataset☆14Updated 3 years ago
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Updated 3 years ago
- Repo from the "Learning with limited labeled data" seminar @ Uni of Tuebingen. A collection of notes, notebooks and slideshows to underst…☆17Updated 2 years ago
- Pytorch implementation of VQA: Visual Question Answering (https://arxiv.org/pdf/1505.00468.pdf) using VQA v2.0 dataset for open-ended ta…☆19Updated 4 years ago
- Video descriptions of research papers relating to foundation models and scaling☆30Updated 2 years ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆37Updated 3 years ago
- Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)☆34Updated 3 years ago
- This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…☆72Updated last year
- In-the-wild Question Answering☆15Updated last year
- A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal"☆82Updated 3 years ago
- ☆33Updated 2 years ago
- ☆28Updated last year
- ☆37Updated 11 months ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 2 years ago
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated this week
- Fine-tuning OpenAI CLIP Model for Image Search on medical images☆76Updated 3 years ago
- code for the ddp tutorial☆32Updated 3 years ago
- ☆11Updated last year
- Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations☆14Updated 2 years ago
- NeuSyRE: A Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment☆18Updated last year
- Contrastive Language-Image Pretraining☆38Updated 9 months ago
- ☆47Updated 3 years ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆71Updated last year
- The open source implementation of the base model behind GPT-4 from OPENAI [Language + Multi-Modal]☆11Updated last year
- A curated list of vision-and-language pre-training (VLP). :-)☆58Updated 2 years ago
- LoRA fine-tuned Stable Diffusion Deployment☆31Updated 2 years ago
- This repository contains code for CVPR 2019 paper "Efficient Video Classification Using Fewer Frames"☆19Updated 4 years ago
- ModelSoups for Tensorflow2 and Torch☆48Updated 2 years ago
- A dashboard for exploring timm learning rate schedulers☆19Updated 5 months ago
- ☆64Updated 3 years ago