apsdehal / flava-tutorials
Tutorials for FLAVA model https://arxiv.org/abs/2112.04482
☆12Updated 2 years ago
Alternatives and similar repositories for flava-tutorials:
Users that are interested in flava-tutorials are comparing it to the libraries listed below
- Vision transformer finetuning scripts☆22Updated last year
- This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…☆71Updated last year
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆71Updated last year
- code for the ddp tutorial☆32Updated 2 years ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆37Updated 2 years ago
- ☆11Updated 11 months ago
- PyTorch implementation of FNet: Mixing Tokens with Fourier transforms☆25Updated 3 years ago
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Updated 3 years ago
- [TMLR 2022] High-Modality Multimodal Transformer☆112Updated 4 months ago
- ☆46Updated 3 years ago
- Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)☆34Updated 3 years ago
- Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch☆100Updated last year
- Adversarial examples to the new ConvNeXt architecture☆20Updated 3 years ago
- Implementation of Zorro, Masked Multimodal Transformer, in Pytorch☆97Updated last year
- Examples of using PyTorch hooks, as covered in my YouTube tutorial video.☆33Updated last year
- Repo from the "Learning with limited labeled data" seminar @ Uni of Tuebingen. A collection of notes, notebooks and slideshows to underst…☆17Updated last year
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆22Updated last year
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities☆78Updated 2 years ago
- ☆43Updated 5 months ago
- Google Research☆46Updated 2 years ago
- opentqa is a open framework of the textbook question answering, which includes xtqa, mcan, cmr, mfb, mutan.☆11Updated 3 years ago
- [ICIP 2022 oral] VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning☆28Updated last year
- A curated list of vision-and-language pre-training (VLP). :-)☆57Updated 2 years ago
- Tracing the most recent advances in Physics Informedd LLMs.☆16Updated 5 months ago
- PyTorch implementation of LIMoE☆53Updated 11 months ago
- Generate text captions for images from their embeddings.☆103Updated last year
- image captioning with flikr8k dataset☆14Updated 3 years ago
- PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)☆121Updated 2 years ago
- A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal"☆81Updated 3 years ago
- Visual Language Transformer Interpreter - An interactive visualization tool for interpreting vision-language transformers☆88Updated last year