Muennighoff / vilio
🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle
☆88Updated last year
Related projects ⓘ
Alternatives and complementary repositories for vilio
- ☆90Updated last year
- Repository containing code from team Kingsterdam for the Hateful Memes Challenge☆19Updated 2 years ago
- An implementation that downstreams pre-trained V+L models to VQA tasks. Now support: VisualBERT, LXMERT, and UNITER☆163Updated last year
- ☆58Updated last year
- Code of Dense Relational Captioning☆67Updated last year
- [CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning☆91Updated 6 months ago
- A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal"☆79Updated 2 years ago
- Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxi…☆53Updated 8 months ago
- [TACL 2021] Code and data for the framework in "Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-La…☆114Updated 2 years ago
- Code and Resources for the Transformer Encoder Reasoning Network (TERN) - https://arxiv.org/abs/2004.09144☆57Updated 11 months ago
- PyTorch bottom-up attention with Detectron2☆229Updated 2 years ago
- Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER…☆119Updated 3 years ago
- ☆44Updated 2 years ago
- Support extracting BUTD features for NLVR2 images.☆18Updated 4 years ago
- A self-evident application of the VQA task is to design systems that aid blind people with sight reliant queries. The VizWiz VQA dataset …☆14Updated 10 months ago
- PyTorch code for EMNLP 2020 Paper "Vokenization: Improving Language Understanding with Visual Supervision"☆186Updated 3 years ago
- ☆129Updated last year
- MERLOT: Multimodal Neural Script Knowledge Models☆223Updated 2 years ago
- BERT + Image Captioning☆130Updated 3 years ago
- PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)☆361Updated last year
- Grid features pre-training code for visual question answering☆268Updated 3 years ago
- Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020☆81Updated 4 years ago
- source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT☆73Updated last year
- Implementation for "Large-scale Pretraining for Visual Dialog" https://arxiv.org/abs/1912.02379☆95Updated 4 years ago
- Good News Everyone! - CVPR 2019☆129Updated 2 years ago
- Dataset and starting code for visual entailment dataset☆108Updated 2 years ago
- Video captioning baseline models on Video2Commonsense Dataset.☆57Updated 3 years ago
- ☆40Updated last year
- CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations☆25Updated last year