apsdehal / flava-tutorialsLinks
Tutorials for FLAVA model https://arxiv.org/abs/2112.04482
☆12Updated 3 years ago
Alternatives and similar repositories for flava-tutorials
Users that are interested in flava-tutorials are comparing it to the libraries listed below
Sorting:
- Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)☆37Updated 3 years ago
- ☆134Updated 2 years ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆39Updated 3 years ago
- This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…☆82Updated 6 months ago
- PyTorch implementation of FNet: Mixing Tokens with Fourier transforms☆28Updated 4 years ago
- Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations☆14Updated 3 years ago
- ☆101Updated 3 years ago
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Updated 4 years ago
- Video descriptions of research papers relating to foundation models and scaling☆30Updated 2 years ago
- Easiest way of fine-tuning HuggingFace video classification models☆147Updated 2 years ago
- TensorFlow implementation of Barlow Twins (https://arxiv.org/abs/2103.03230).☆41Updated 4 years ago
- code for the ddp tutorial☆32Updated 3 years ago
- A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal"☆83Updated 3 years ago
- https://slds-lmu.github.io/seminar_multimodal_dl/☆171Updated 2 years ago
- Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training☆168Updated 2 years ago
- Implementation of Zorro, Masked Multimodal Transformer, in Pytorch☆97Updated 2 years ago
- An education step by step implementation of SimCLR that accompanies the blogpost☆31Updated 3 years ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 3 years ago
- ☆33Updated 3 years ago
- Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch☆91Updated last year
- CLIP (Contrastive Language–Image Pre-training) for Italian☆185Updated 2 years ago
- The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-languag…☆232Updated 3 years ago
- A modular PyTorch library for vision transformer models☆164Updated 2 years ago
- ☆40Updated last year
- Fine-tuning OpenAI CLIP Model for Image Search on medical images☆77Updated 3 years ago
- ModelSoups for Tensorflow2 and Torch☆50Updated 3 years ago
- Generate text captions for images from their embeddings.☆116Updated 2 years ago
- image captioning with flikr8k dataset☆14Updated 4 years ago
- ☆63Updated 4 years ago
- Simple MAE (masked autoencoders) with pytorch and pytorch-lightning.☆44Updated last year