HuiGuanLab / nrccrLinks

Source code of our MM'22 paper Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning

☆13

Alternatives and similar repositories for nrccr

Users that are interested in nrccr are comparing it to the libraries listed below

Sorting:

microsoft / multimodal-aligned-recipe-corpus
☆17Updated last year
dialogtekgeek / AVSD-DSTC10_Official
Audio Visual Scene-Aware Dialog (AVSD) Challenge at the 10th Dialog System Technology Challenge (DSTC)
☆27Updated 2 years ago
princetonvisualai / SPICE-U
☆11Updated 4 years ago
Deferf / CLIP_Video_Representation
Use CLIP to represent video for Retrieval Task
☆69Updated 4 years ago
papermsucode / mdmmt
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
☆26Updated 3 years ago
salanueva / UniVSE
UniVSE implementation on Python3
☆10Updated 4 years ago
zinengtang / VidLanKD
Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer (NeurIPS 2021))
☆56Updated 2 years ago
gchhablani / multilingual-vqa
Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.
☆34Updated 3 years ago
keep-smile-001 / opentqa
opentqa is a open framework of the textbook question answering, which includes xtqa, mcan, cmr, mfb, mutan.
☆11Updated 4 years ago
nabihach / IDA
☆13Updated 5 years ago
hucvl / prn
Procedural Reasoning Networks
☆7Updated 4 years ago
gchhablani / multilingual-image-captioning
☆44Updated 3 years ago
jayleicn / mTVRetrieval
[ACL 2021] mTVR: Multilingual Video Moment Retrieval
☆27Updated 2 years ago
google-research-datasets / maxm
MaXM is a suite of test-only benchmarks for multilingual visual question answering in 7 languages: English (en), French (fr), Hindi (hi),…
☆13Updated last year
yj-yu / CiSIN
Character Grounding and Re-Identification in Story of Videos and Text Descriptions
☆10Updated 4 years ago
ExplainableML / CLEVR-X
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
☆28Updated last year
YehLi / TDEN
☆9Updated 2 years ago
zmykevin / UC2
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
☆34Updated 3 years ago
berniebear / Multi-HT100M
☆53Updated 3 years ago
NewsStoriesData / newsstories.github.io
☆22Updated 2 years ago
zychen423 / KE-VIST
The code and output of our AAAI paper "Knowledge-Enriched Visual Storytelling"
☆40Updated 4 years ago
jayleicn / VideoLanguageFuturePred
[EMNLP 2020] What is More Likely to Happen Next? Video-and-Language Future Event Prediction
☆49Updated 2 years ago
e-bug / cross-modal-ablation
[EMNLP 2021] Code and data for our paper "Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers…
☆20Updated 3 years ago
researchmm / generate-it
A collection of models for image<->text generation in ACM MM 2021.
☆66Updated 3 years ago
zinengtang / Perceiver_VL
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
☆33Updated 2 years ago
allenai / x-lxmert
PyTorch code for EMNLP 2020 paper "X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers"
☆50Updated 3 years ago
VALUE-Leaderboard / DataRelease
Data Release for VALUE Benchmark
☆31Updated 3 years ago
li-xirong / video-retrieval
Deep Learning for Video Retrieval by Natural Language
☆11Updated 5 years ago
salesforce / FactLM
☆10Updated last week
shengyuzhang / VideoTitling
Comprehensive Information Integration Modeling Framework for Video Titling
☆11Updated 4 years ago