Ming-er / MGA-CLAP
official implementation of MGA-CLAP (ACM MM 2024)
☆14Updated 6 months ago
Alternatives and similar repositories for MGA-CLAP:
Users that are interested in MGA-CLAP are comparing it to the libraries listed below
- ☆26Updated last week
- This repository collects papers related to Speech Tokenizer.☆17Updated 6 months ago
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆26Updated last month
- It includes papers on speech&audio field. Now update: ICLR2023-2025, ICML2023-2024, NeurIPS2023-2024, ACMMM2024, AAAI2024, ACL2024, EMNLP…☆49Updated this week
- ☆22Updated 6 months ago
- ☆12Updated last year
- [ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"☆82Updated 5 months ago
- [ICLR 2025] Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes☆30Updated this week
- [INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by In…☆44Updated last year
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆58Updated 5 months ago
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆67Updated 8 months ago
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆37Updated last week
- The dataset and baseline code for Text-to-Audio Grounding (TAG)☆42Updated 3 months ago
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆126Updated 4 months ago
- [ACM MM24] Official implementation of paper "From Speaker to Dubber: Movie Dubbing with Prosody and Duration Consistency Learning"☆26Updated 3 months ago
- A deepfake audio dataset for detecting fake speech from codec-based speech synthesis systems, Interspeech 2024☆17Updated 8 months ago
- ☆62Updated last month
- This is a general framework for fake audio detection using pytorch lightning☆20Updated this week
- This package aims at simplifying the download of the AudioCaps dataset.☆33Updated last year
- The demo page for ALMTokenizer☆43Updated last week
- ☆70Updated last year
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆34Updated 3 weeks ago
- Source code for DM-Codec.☆41Updated 6 months ago
- TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking☆15Updated last week
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆52Updated 5 months ago
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …☆62Updated 3 months ago
- ☆11Updated 7 months ago
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆62Updated this week
- Official Implementation and Dataset of paper - DFADD: The Diffusion and Flow-matching based Audio Deepfake Dataset☆12Updated 2 weeks ago
- [CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation☆35Updated 7 months ago