apsdehal / flava-tutorials
Tutorials for FLAVA model https://arxiv.org/abs/2112.04482
☆12Updated 2 years ago
Alternatives and similar repositories for flava-tutorials:
Users that are interested in flava-tutorials are comparing it to the libraries listed below
- ☆33Updated 9 months ago
- This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…☆70Updated last year
- A curated list of vision-and-language pre-training (VLP). :-)☆56Updated 2 years ago
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Updated 3 years ago
- Implementation of Qformer from BLIP2 in Zeta Lego blocks.☆35Updated 2 months ago
- ☆27Updated last year
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆37Updated 2 years ago
- Hate-CLIPper: Multimodal Hateful Meme Classification with Explicit Cross-modal Interaction of CLIP features - Accepted at EMNLP 2022 Work…☆46Updated last year
- opentqa is a open framework of the textbook question answering, which includes xtqa, mcan, cmr, mfb, mutan.☆11Updated 3 years ago
- A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal"☆81Updated 2 years ago
- [TMLR 2022] High-Modality Multimodal Transformer☆110Updated 2 months ago
- Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)☆34Updated 3 years ago
- Implementation of CaiT models in TensorFlow and ImageNet-1k checkpoints. Includes code for inference and fine-tuning.☆12Updated last year
- Repo from the "Learning with limited labeled data" seminar @ Uni of Tuebingen. A collection of notes, notebooks and slideshows to underst…☆17Updated last year
- Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations☆14Updated 2 years ago
- ☆58Updated 3 years ago
- Code for any videos☆26Updated 11 months ago
- Pytorch implementation of image captioning using transformer-based model.☆62Updated last year
- Projects completed under LinuxWorld Informatics Ltd. - MLOps Training.☆12Updated 4 years ago
- Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch☆98Updated last year
- [CVPRW22] Official Implementation of T-Food: "Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval". Accept…☆30Updated 2 years ago
- Course repository for the Spring 2023 COMP664 course "Deep Learning" at UNC☆14Updated last year
- NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks, CVPR 2022 (Oral)☆45Updated last year
- In-the-wild Question Answering☆15Updated last year
- Blog of the LibreCV.org☆11Updated 3 years ago
- Code and data for ImageCoDe, a contextual vison-and-language benchmark☆39Updated 10 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆69Updated last year
- Implementation of Metaformer, but in an autoregressive manner☆23Updated 2 years ago
- PyTorch implementation of FNet: Mixing Tokens with Fourier transforms☆25Updated 3 years ago
- Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch☆88Updated last year