zinengtang / TVLT
View external linksLinks

PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)

☆124

Alternatives and similar repositories for TVLT

Users that are interested in TVLT are comparing it to the libraries listed below

Sorting:

zinengtang / Perceiver_VL
View on GitHub
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
☆33Feb 5, 2023Updated 3 years ago
zinengtang / VidLanKD
View on GitHub
Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer (NeurIPS 2021))
☆56Feb 6, 2023Updated 3 years ago
zinengtang / DeCEMBERT
View on GitHub
Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)
☆17Jan 12, 2023Updated 3 years ago
jayleicn / mTVRetrieval
View on GitHub
[ACL 2021] mTVR: Multilingual Video Moment Retrieval
☆27Aug 20, 2022Updated 3 years ago
antoyang / FrozenBiLM
View on GitHub
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
☆158Dec 9, 2024Updated last year
Yui010206 / SeViLA
View on GitHub
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
☆195Jan 14, 2024Updated 2 years ago
airsplay / vimpac
View on GitHub
☆73Jun 3, 2022Updated 3 years ago
rowanz / merlot_reserve
View on GitHub
Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"
☆146Jun 1, 2022Updated 3 years ago
VALUE-Leaderboard / DataRelease
View on GitHub
Data Release for VALUE Benchmark
☆30Feb 16, 2022Updated 3 years ago
yonatanbitton / data_efficient_masked_language_modeling_for_vision_and_language
View on GitHub
Repository for the paper "Data Efficient Masked Language Modeling for Vision and Language".
☆18Sep 17, 2021Updated 4 years ago
jayleicn / singularity
View on GitHub
[ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"
☆136May 5, 2023Updated 2 years ago
YapengTian / AVVP-ECCV20
View on GitHub
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)
☆89Jul 25, 2024Updated last year
MikeWangWZHL / VidIL
View on GitHub
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆116Sep 15, 2022Updated 3 years ago
jasonppy / word-discovery
View on GitHub
Word Discovery in Visually Grounded, Self-Supervised Speech Models
☆26Dec 4, 2023Updated 2 years ago
zmykevin / UC2
View on GitHub
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
☆34Nov 9, 2021Updated 4 years ago
klauscc / VindLU
View on GitHub
☆110Dec 23, 2022Updated 3 years ago
atosystem / SpeechCLIP
View on GitHub
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022
☆118Nov 25, 2022Updated 3 years ago
Shalev-Lifshitz / MultiAgentVerification
View on GitHub
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
☆27Mar 1, 2025Updated 11 months ago
kakaobrain / noc
View on GitHub
☆47Apr 29, 2024Updated last year
easonnie / mlp-vil
View on GitHub
MLPs for Vision and Langauge Modeling (Coming Soon)
☆27Dec 9, 2021Updated 4 years ago
rowanz / merlot
View on GitHub
MERLOT: Multimodal Neural Script Knowledge Models
☆225Mar 15, 2022Updated 3 years ago
HS-YN / PanoAVQA
View on GitHub
Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)
☆17Oct 12, 2021Updated 4 years ago
jialuli-luka / Video-MSG
View on GitHub
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
☆24Apr 14, 2025Updated 10 months ago
TengdaHan / TemporalAlignNet
View on GitHub
[CVPR'22 Oral] Temporal Alignment Networks for Long-term Video. Tengda Han, Weidi Xie, Andrew Zisserman.
☆119Oct 9, 2023Updated 2 years ago
amazon-science / video-contrastive-learning
View on GitHub
Video Contrastive Learning with Global Context, ICCVW 2021
☆162May 30, 2022Updated 3 years ago
Chuhanxx / Temporal_Query_Networks
View on GitHub
The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding
☆64Mar 9, 2022Updated 3 years ago
IFICL / SLfM
View on GitHub
Official code for the paper: [ICCV2023] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
☆41Dec 23, 2023Updated 2 years ago
Splend1d / T5lephone
View on GitHub
Code for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
☆19Nov 29, 2022Updated 3 years ago
DanielLin94144 / DUAL-textless-SQA
View on GitHub
Textless (ASR-transcript free) Spoken Question Answering. The official release of NMSQA dataset and the implementation of "DUAL: Textless…
☆35Aug 10, 2023Updated 2 years ago
GenjiB / ECLIPSE
View on GitHub
☆34Mar 10, 2023Updated 2 years ago
jason9693 / ETA4LLMs
View on GitHub
Calculating Expected Time for training LLM.
☆38Apr 17, 2023Updated 2 years ago
Guillem96 / data2vec-vision
View on GitHub
PyTorch implementation of Data2Vec self-supervised approach for vision use cases.
☆18Oct 7, 2022Updated 3 years ago
ahmedssabir / Belief-Revision-Score
View on GitHub
Belief Revision based Caption Re-ranker with Visual Semantic Information. COLING 2022
☆11Apr 13, 2025Updated 10 months ago
j-min / HiREST
View on GitHub
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
☆107Jan 23, 2025Updated last year
clip-vil / CLIP-ViL
View on GitHub
[ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383
☆421Oct 28, 2022Updated 3 years ago
mhh0318 / UniD3
View on GitHub
☆55Feb 9, 2023Updated 3 years ago
showlab / all-in-one
View on GitHub
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
☆281Mar 25, 2023Updated 2 years ago
LooperXX / ProSLU
View on GitHub
Open source code and data for AAAI 2022 Oral Paper "Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding"
☆35May 26, 2024Updated last year
declare-lab / MSA-Robustness
View on GitHub
NAACL 2022 paper on Analyzing Modality Robustness in Multimodal Sentiment Analysis
☆31Jan 21, 2023Updated 3 years ago

zinengtang / TVLTView external linksLinks

Alternatives and similar repositories for TVLT

zinengtang / TVLT
View external linksLinks