ictnlp/DSTC8-AVSD

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ictnlp/DSTC8-AVSD)

ictnlp / DSTC8-AVSD

We rank the 1st in DSTC8 Audio-Visual Scene-Aware Dialog competition. This is the source code for our IEEE/ACM TASLP (AAAI2020-DSTC8-AVSD) paper "Bridging Text and Video: A Universal Multimodal Transformer for Video-Audio Scene-Aware Dialog".

☆56

Alternatives and similar repositories for DSTC8-AVSD

Users that are interested in DSTC8-AVSD are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hudaAlamri / DSTC7-Audio-Visual-Scene-Aware-Dialog-AVSD-Challenge
View on GitHub
☆54Nov 18, 2019Updated 6 years ago
dialogtekgeek / DSTC8-AVSD_official
View on GitHub
DSTC8-AVSD: Sentence generation task for Audio Visual Scene-aware Dialog
☆14Jun 10, 2021Updated 5 years ago
salesforce / BiST
View on GitHub
Code for the paper BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues (EMNLP20)
☆11Jun 16, 2025Updated last year
ZihaoW123 / UniMM
View on GitHub
Implementation for the paper "Unified Multimodal Model with Unlikelihood Training for Visual Dialog"
☆13May 12, 2023Updated 3 years ago
HKUST-KnowComp / VD-PCR
View on GitHub
Source code for paper "VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution"
☆10Nov 1, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
dialogtekgeek / AudioVisualSceneAwareDialog
View on GitHub
☆27May 4, 2020Updated 6 years ago
vmurahari3 / visdial-bert
View on GitHub
Implementation for "Large-scale Pretraining for Visual Dialog" https://arxiv.org/abs/1912.02379
☆95Mar 31, 2020Updated 6 years ago
facebookresearch / DVDialogues
View on GitHub
Code for DVD A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue
☆14Oct 12, 2021Updated 4 years ago
gicheonkang / sglkt-visdial
View on GitHub
🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"
☆13Feb 1, 2023Updated 3 years ago
facebookresearch / simmc2
View on GitHub
Code for SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations
☆109Nov 12, 2022Updated 3 years ago
idansc / mrr-ndcg
View on GitHub
☆18Jun 10, 2024Updated 2 years ago
gicheonkang / dan-visdial
View on GitHub
✨ Official PyTorch Implementation for EMNLP'19 Paper, "Dual Attention Networks for Visual Reference Resolution in Visual Dialog"
☆44Mar 19, 2023Updated 3 years ago
ramakanth-pasunuru / video-dialogue
View on GitHub
Dataset and models for paper "Game-Based Video-Context Dialogue (EMNLP 2018)"
☆19Oct 25, 2018Updated 7 years ago
ych133 / How2R-and-How2QA
View on GitHub
A video retrieval dataset How2R and a video QA dataset How2QA
☆24Oct 15, 2020Updated 5 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
shanwangshan / TAU-urban-audio-visual-scenes
View on GitHub
☆12Oct 23, 2021Updated 4 years ago
rowanz / merlot
View on GitHub
MERLOT: Multimodal Neural Script Knowledge Models
☆226Mar 15, 2022Updated 4 years ago
sdogsq / UCAS-Course-Evaluate-20Spring
View on GitHub
国科大课程评估脚本，2020春/夏季学期版、评教
☆22May 22, 2023Updated 3 years ago
ShannonAI / OpenViDial
View on GitHub
Code, Models and Datasets for OpenViDial Dataset
☆133Jan 22, 2022Updated 4 years ago
ictnlp / AIH
View on GitHub
Code for Findings of ACL 2021 paper "Addressing Inquiries about History: An Efficient and Practical Framework for Evaluating Open-domain …
☆19Dec 16, 2022Updated 3 years ago
roudimit / AVLnet
View on GitHub
Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.
☆54Mar 30, 2022Updated 4 years ago
zinengtang / DeCEMBERT
View on GitHub
Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)
☆17Jan 12, 2023Updated 3 years ago
lichengunc / pretrain-vl-data
View on GitHub
Pre-trained V+L Data Preparation
☆47Jun 2, 2020Updated 6 years ago
quangvnai / visdial
View on GitHub
Visual Dialog: Light-weight Transformer for Many Inputs (ECCV 2020)
☆29Aug 5, 2021Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
salesforce / ALPRO
View on GitHub
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
☆188May 1, 2025Updated last year
sled-group / MindCraft
View on GitHub
Official code for our EMNLP2021 Outstanding Paper MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks
☆21May 18, 2023Updated 3 years ago
mshukor / ima-lmms
View on GitHub
[NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
☆23Oct 15, 2024Updated last year
jayleicn / TVCaption
View on GitHub
[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset
☆91Sep 6, 2023Updated 2 years ago
HKUST-KnowComp / Visual_PCR
View on GitHub
Dataset and Source code for EMNLP 2019 paper "What You See is What You Get: Visual Pronoun Coreference Resolution in Dialogues"
☆26Sep 10, 2021Updated 4 years ago
Yui010206 / Ego2Web
View on GitHub
[CVPR 2026] Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos
☆29Mar 25, 2026Updated 4 months ago
idansc / simple-avsd
View on GitHub
Code for ''A Simple Baseline for Audio-Visual Scene-Aware Dialog``
☆27May 26, 2020Updated 6 years ago
ppfliu / emotion-recognition
View on GitHub
Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition
☆15May 10, 2022Updated 4 years ago
shubhamagarwal92 / visdial_conv
View on GitHub
This repository contains code used in our ACL'20 paper History for Visual Dialog: Do we really need it?
☆33Mar 24, 2023Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
shh1574 / multi-modal-dialogue-dataset
View on GitHub
☆22Aug 30, 2021Updated 4 years ago
facebookresearch / codraw-models
View on GitHub
Models for the Collaborative Drawing (CoDraw) task
☆14Jan 15, 2019Updated 7 years ago
idansc / fga
View on GitHub
☆30Updated this week
ARIES-LM / GMNMT
View on GitHub
☆30Nov 3, 2020Updated 5 years ago
feizc / PNAIC
View on GitHub
Partially Non-Autoregressive Image Captioning
☆10Sep 30, 2021Updated 4 years ago
Jason-Young-AI / YoungToolkit
View on GitHub
A Toolkit for a series of Young projects.
☆23Apr 30, 2021Updated 5 years ago
simpleshinobu / visdial-principles
View on GitHub
Implementation for CVPR 2020 Paper "Two Causal Principles for Improving Visual Dialog"
☆31Feb 19, 2023Updated 3 years ago