michelecafagna26 / VinVLLinks

Original VinVL (and Oscar) repo with API designed for an easy inference

☆8

Alternatives and similar repositories for VinVL

Users that are interested in VinVL are comparing it to the libraries listed below

Sorting:

woojeongjin / FewVLM
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models (ACL 2022)
☆42Updated 3 years ago
zmykevin / UVLP
CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
☆22Updated 3 years ago
MrZilinXiao / AutoVER
[ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.
☆14Updated last year
njucckevin / KnowCap
Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
☆12Updated last year
Yangyi-Chen / CoTConsistency
The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".
☆33Updated last year
kugwzk / DiDE
Code for EMNLP 2022 paper “Distilled Dual-Encoder Model for Vision-Language Understanding”
☆30Updated 2 years ago
zmykevin / UC2
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
☆34Updated 3 years ago
guilk / KAT
Research code for "KAT: A Knowledge Augmented Transformer for Vision-and-Language"
☆66Updated 3 years ago
eric-ai-lab / CPL
Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"
☆34Updated 2 years ago
PaulLerner / ViQuAE
Source code and data used in the papers ViQuAE (Lerner et al., SIGIR'22), Multimodal ICT (Lerner et al., ECIR'23) and Cross-modal Retriev…
☆38Updated 7 months ago
phellonchen / awesome-visual-dialog
Recent Advances in Visual Dialog
☆30Updated 2 years ago
limanling / clip-event
☆104Updated 3 years ago
open-vision-language / oven
☆39Updated last year
szzexpoi / POEM
Official Implementation for CVPR 2023 paper "Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasonin…
☆10Updated last year
facebookresearch / reliable_vqa
Implementation for the paper "Reliable Visual Question Answering Abstain Rather Than Answer Incorrectly" (ECCV 2022: https//arxiv.org/abs…
☆35Updated 2 years ago
HenryHZY / VL-PET
[ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"
☆53Updated last year
archiki / RepARe
☆19Updated last year
Pual2013 / FRPT
Fine-grained Retrieval Prompt Tuning
☆3Updated last year
google-deepmind / svo_probes
The SVO-Probes Dataset for Verb Understanding
☆31Updated 3 years ago
sIncerass / MVLPT
code for "Multitask Vision-Language Prompt Tuning" https://arxiv.org/abs/2211.11720
☆56Updated last year
layer6ai-labs / SGG-Seq2Seq
Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"
☆43Updated 3 years ago
ZihaoW123 / UniMM
Implementation for the paper "Unified Multimodal Model with Unlikelihood Training for Visual Dialog"
☆13Updated 2 years ago
RitaRamo / extra
Retrieval-augmented Image Captioning
☆13Updated 2 years ago
zjuchenlong / WSAG
[EMNLP'22] Weakly-Supervised Temporal Article Grounding
☆14Updated last year
zhiyuanhubj / Long_form_VideoQA
[EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering
☆19Updated 9 months ago
fawazsammani / nlxgpt
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks, CVPR 2022 (Oral)
☆48Updated last year
shizhediao / DaVinci
Source code for the paper "Prefix Language Models are Unified Modal Learners"
☆43Updated 2 years ago
d-ailin / CLIP-Guided-Decoding
☆17Updated last year
SihengLi99 / TextBind
[2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation
☆47Updated last year
jialinwu17 / MAVEX
☆30Updated 2 years ago