YehLi / TDENLinks

☆9

Alternatives and similar repositories for TDEN

Users that are interested in TDEN are comparing it to the libraries listed below

Sorting:

princetonvisualai / SPICE-U
☆11Updated 4 years ago
baaaad / ECE
[ECCV'22 Poster] Explicit Image Caption Editing
☆22Updated 2 years ago
yonatanbitton / data_efficient_masked_language_modeling_for_vision_and_language
Repository for the paper "Data Efficient Masked Language Modeling for Vision and Language".
☆18Updated 3 years ago
Trunpm / TPT-for-VideoQA
☆19Updated 2 years ago
guilk / VLC
Research code for "Training Vision-Language Transformers from Captions Alone"
☆34Updated 3 years ago
andreineculai / MPC
☆24Updated 3 years ago
papermsucode / mdmmt
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
☆26Updated 4 years ago
YuanEZhou / satic
☆26Updated 4 years ago
Vision-CAIR / LTVRR
☆35Updated last year
zmykevin / UVLP
CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
☆22Updated 3 years ago
zinengtang / DeCEMBERT
Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)
☆17Updated 2 years ago
chenxy99 / SD-FSIC
Official code for the paper "Self-Distillation for Few-Shot Image Captioning"
☆14Updated 4 years ago
ShiYaya / emscore
Research code for CVPR 2022 paper: "EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching"
☆26Updated 2 years ago
MILVLG / rosita
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
☆56Updated 2 years ago
hardyqr / HAL
[AAAI'20] Code release for "HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs".
☆38Updated last year
showlab / Region_Learner
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"
☆42Updated 3 years ago
showlab / DemoVLP
[Arxiv2022] Revitalize Region Feature for Democratizing Video-Language Pre-training
☆21Updated 3 years ago
zhjohnchan / SK-VG
[CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.
☆31Updated 2 years ago
AndresPMD / semantic_adaptive_margin
WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching
☆16Updated 3 years ago
AmeenAli / VideoMatch
☆12Updated 3 years ago
princetonvisualai / pointingqa
Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"
☆19Updated 2 years ago
subhc / clever
The Curious Layperson: Fine-Grained Image Recognition without Expert Labels (BMVC 2021 best student paper)
☆23Updated 3 years ago
Sense-GVT / BigPretrain
A Simple Framwork for CV Pre-training Model (SOCO, VirTex, BEiT)
☆15Updated 3 years ago
KMnP / nn-revisit
Rethinking Nearest Neighbors for Visual Classification
☆31Updated 3 years ago
LisaAnne / TemporalLanguageRelease
☆43Updated 4 years ago
yj-yu / CiSIN
Character Grounding and Re-Identification in Story of Videos and Text Descriptions
☆10Updated 4 years ago
YuanEZhou / CBTrans
☆22Updated 3 years ago
airsplay / vimpac
☆73Updated 3 years ago
scwangdyd / large_vocabulary_hoi_detection
Code for ICCV2021: Discovering Human Interactions with Large-Vocabulary Objects via Query and Multi-Scale Detection
☆25Updated 3 years ago
XLearning-SCU / 2021-CVPR-MRL
Learning Cross-modal Retrieval with Noisy Labels (CVPR 2021, PyTorch Code)
☆13Updated 4 years ago