limanling / clip-eventLinks

☆106

Alternatives and similar repositories for clip-event

Users that are interested in clip-event are comparing it to the libraries listed below

Sorting:

zmykevin / UC2
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
☆34Updated 4 years ago
MichaelZhouwang / VLUE
This repo contains codes and instructions for baselines in the VLUE benchmark.
☆41Updated 3 years ago
zmykevin / UVLP
CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
☆22Updated 3 years ago
MILVLG / rosita
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
☆56Updated 2 years ago
kugwzk / DiDE
Code for EMNLP 2022 paper “Distilled Dual-Encoder Model for Vision-Language Understanding”
☆31Updated 2 years ago
Zhiquan-Wen / D-VQA
PyTorch implementation of "Debiased Visual Question Answering from Feature and Sample Perspectives" (NeurIPS 2021)
☆26Updated 3 years ago
guilk / KAT
Research code for "KAT: A Knowledge Augmented Transformer for Vision-and-Language"
☆69Updated 3 years ago
PhoebusSi / SAR
Code for our ACL2021 paper: "Check It Again: Progressive Visual Question Answering via Visual Entailment"
☆31Updated 3 years ago
zengyan-97 / CCLM
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training (ACL 2023))
☆92Updated 2 years ago
AndersonStra / MuKEA
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering
☆99Updated 2 years ago
limanling / m2e2
Cross-media Structured Common Space for Multimedia Event Extraction (ACL2020)
☆76Updated 2 years ago
yuleiniu / cfvqa
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias
☆126Updated 3 years ago
woojeongjin / FewVLM
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models (ACL 2022)
☆43Updated 3 years ago
luomancs / retriever_reader_for_okvqa
☆18Updated 2 years ago
PaulLerner / ViQuAE
Source code and data used in the papers ViQuAE (Lerner et al., SIGIR'22), Multimodal ICT (Lerner et al., ECIR'23) and Cross-modal Retriev…
☆38Updated 11 months ago
YuJungHeo / kbvqa-public
☆40Updated 2 years ago
e-bug / iglue
[ICML 2022] Code and data for our paper "IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages"
☆49Updated 2 years ago
open-vision-language / oven
☆40Updated 2 years ago
hackerchenzhuo / LaKo
[Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection
☆26Updated last year
thunlp / PEVL
Source code for EMNLP 2022 paper “PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models”
☆48Updated 3 years ago
ShannonAI / OpenViDial
Code, Models and Datasets for OpenViDial Dataset
☆132Updated 3 years ago
phellonchen / awesome-visual-dialog
Recent Advances in Visual Dialog
☆30Updated 3 years ago
microsoft / M3P
Multitask Multilingual Multimodal Pre-training
☆71Updated 2 years ago
NeverMoreLCH / Awesome-VQA
A reading list of papers about Visual Question Answering.
☆35Updated 3 years ago
Junction4Nako / mvp_pytorch
pytorch implementation of mvp: a multi-stage vision-language pre-training framework
☆34Updated 2 years ago
jialinwu17 / MAVEX
☆30Updated 2 years ago
allenai / multimodalqa
☆146Updated 3 years ago
ZihaoW123 / UniMM
Implementation for the paper "Unified Multimodal Model with Unlikelihood Training for Visual Dialog"
☆13Updated 2 years ago
YujieLu10 / IACE-NLU
Official repo for "Imagination-Augmented Natural Language Understanding", NAACL 2022.
☆17Updated 3 years ago
mad-red / VSR-guided-CIC
Human-like Controllable Image Captioning with Verb-specific Semantic Roles.
☆36Updated 3 years ago