Jack-ZC8 / M3AV-datasetLinks
[ACL 2024] A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
☆16Updated 3 weeks ago
Alternatives and similar repositories for M3AV-dataset
Users that are interested in M3AV-dataset are comparing it to the libraries listed below
Sorting:
- A comprehensive overview of affective computing research in the era of large language models (LLMs).☆23Updated 10 months ago
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Rea…☆62Updated 3 weeks ago
- ☆37Updated last year
- ☆21Updated last month
- ☆11Updated 4 months ago
- ☆38Updated 10 months ago
- Code for ACL 2023 main conference paper "CMOT: Cross-modal Mixup via Optimal Transport for Speech Translation"☆18Updated 7 months ago
- ☆18Updated last year
- a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.☆20Updated 2 months ago
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems☆82Updated last year
- BLSP-Emo: Towards Empathetic Large Speech-Language Models☆46Updated last year
- This is the official code repository of ICLR 2023 Tiny Paper, "Hierarchical Dialogue Understanding with Special Tokens and Turn-level Att…☆19Updated 2 years ago
- A project for tri-modal LLM benchmarking and instruction tuning.☆38Updated 3 months ago
- Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue (ACL 2024)☆23Updated 10 months ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Updated 5 months ago
- [AAAI 2024] DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning☆16Updated last year
- ☆11Updated last month
- ☆23Updated 9 months ago
- ☆20Updated 5 months ago
- code for paper "Cross-modal Contrastive Learning for Speech Translation" (NAACL 2022)☆64Updated 3 years ago
- GPT-4V with Emotion☆93Updated last year
- (TPAMI'2024) ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation☆22Updated 10 months ago
- Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".☆61Updated 11 months ago
- MIntRec2.0 is the first large-scale dataset for multimodal intent recognition and out-of-scope detection in multi-party conversations (IC…☆48Updated last week
- Code for paper "MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recogni…☆16Updated 2 years ago
- ☆23Updated last month
- MIntRec: A New Dataset for Multimodal Intent Recognition (ACM MM 2022)☆93Updated last month
- [ICASSP2024] Code for paper "SDIF-DA: A Shallow-to-Deep Interaction Framework with Data Augmentation for Multi-modal Intent Detection"☆11Updated 11 months ago
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆103Updated 6 months ago
- NAACL 2022 paper on Analyzing Modality Robustness in Multimodal Sentiment Analysis☆31Updated 2 years ago