We rank the 1st in DSTC8 Audio-Visual Scene-Aware Dialog competition. This is the source code for our IEEE/ACM TASLP (AAAI2020-DSTC8-AVSD) paper "Bridging Text and Video: A Universal Multimodal Transformer for Video-Audio Scene-Aware Dialog".
☆56Jun 12, 2023Updated 2 years ago
Alternatives and similar repositories for DSTC8-AVSD
Users that are interested in DSTC8-AVSD are comparing it to the libraries listed below
Sorting:
- ☆54Nov 18, 2019Updated 6 years ago
- DSTC8-AVSD: Sentence generation task for Audio Visual Scene-aware Dialog☆14Jun 10, 2021Updated 4 years ago
- Code for the paper BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues (EMNLP20)☆11Jun 16, 2025Updated 8 months ago
- Implementation for the paper "Unified Multimodal Model with Unlikelihood Training for Visual Dialog"☆13May 12, 2023Updated 2 years ago
- Source code for paper "VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution"☆10Nov 1, 2022Updated 3 years ago
- Implementation for "Large-scale Pretraining for Visual Dialog" https://arxiv.org/abs/1912.02379☆97Mar 31, 2020Updated 5 years ago
- Code for the paper Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems (ACL19)☆100Oct 17, 2022Updated 3 years ago
- Implementation of FixMatch in PyTorch and experimentations☆12Aug 9, 2020Updated 5 years ago
- ☆27May 4, 2020Updated 5 years ago
- Code for SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations☆106Nov 12, 2022Updated 3 years ago
- 🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"☆13Feb 1, 2023Updated 3 years ago
- “操作系统高级教程”课程思考题☆12Nov 10, 2019Updated 6 years ago
- Reproduce of 'Weakly Supervised Coupled Networks for Visual Sentiment Analysis'☆13Nov 7, 2019Updated 6 years ago
- Paper List for Dialogue and Interactive Systems☆15Jun 5, 2020Updated 5 years ago
- ☆19Jun 7, 2021Updated 4 years ago
- MERLOT: Multimodal Neural Script Knowledge Models☆226Mar 15, 2022Updated 3 years ago
- Dataset and models for paper "Game-Based Video-Context Dialogue (EMNLP 2018)"☆19Oct 25, 2018Updated 7 years ago
- Benchmark data and code for Question-Answering on Movie stories☆46Apr 17, 2020Updated 5 years ago
- Vision and Language Agent Navigation☆85Jan 29, 2021Updated 5 years ago
- ✨ Official PyTorch Implementation for EMNLP'19 Paper, "Dual Attention Networks for Visual Reference Resolution in Visual Dialog"☆45Mar 19, 2023Updated 2 years ago
- VIsually-Pivoted Audio and(N) Text☆22May 16, 2022Updated 3 years ago
- Official code for our EMNLP2021 Outstanding Paper MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks☆21May 18, 2023Updated 2 years ago
- Code, Models and Datasets for OpenViDial Dataset☆132Jan 22, 2022Updated 4 years ago
- [ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset☆90Sep 6, 2023Updated 2 years ago
- Pre-trained V+L Data Preparation☆46Jun 2, 2020Updated 5 years ago
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs☆23Oct 15, 2024Updated last year
- 国科大课程评估脚本,2020春/夏季学期版、评教☆20May 22, 2023Updated 2 years ago
- 记录2022届各大厂计算机的福报信息(提前批、正式批的网申内推信息)☆45Sep 8, 2021Updated 4 years ago
- A video retrieval dataset How2R and a video QA dataset How2QA☆24Oct 15, 2020Updated 5 years ago
- Starter code for the VMT task and challenge☆51Jul 29, 2020Updated 5 years ago
- Code for CVPR 2022 paper "Scene Consistency Representation Learning for Video Scene Segmentation"☆105Feb 14, 2023Updated 3 years ago
- 💻 Development site and blog for Quansight Labs☆22Mar 6, 2023Updated 2 years ago
- [ICLR'25 Oral] MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models☆35Nov 3, 2024Updated last year
- Lyrics and Vocal Melody Generation conditioned on Accompaniment☆29Aug 27, 2022Updated 3 years ago
- This repository contains code used in our ACL'20 paper History for Visual Dialog: Do we really need it?☆34Mar 24, 2023Updated 2 years ago
- Multimodal deep quality embedding network (MMDQEN) for affective video content analysis. (MM'19, TAFFC'20)☆10Jul 24, 2021Updated 4 years ago
- Dataset and Source code for EMNLP 2019 paper "What You See is What You Get: Visual Pronoun Coreference Resolution in Dialogues"☆26Sep 10, 2021Updated 4 years ago
- Align and Prompt: Video-and-Language Pre-training with Entity Prompts☆188May 1, 2025Updated 10 months ago
- Source code for the AAAI 2021 paper "Movie Summarization via Sparse Graph Construction"☆30Feb 18, 2021Updated 5 years ago