Ming-er / MGA-CLAP
official implementation of MGA-CLAP (ACM MM 2024)
☆15Updated 6 months ago
Alternatives and similar repositories for MGA-CLAP
Users that are interested in MGA-CLAP are comparing it to the libraries listed below
Sorting:
- This repository collects papers related to Speech Tokenizer.☆16Updated 7 months ago
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆30Updated 2 months ago
- ☆12Updated last year
- It includes papers on speech&audio field. Now update: ICLR2023-2025, ICML2023-2024, NeurIPS2023-2024, ACMMM2024, AAAI2024, ACL2024, EMNLP…☆55Updated 3 weeks ago
- ☆23Updated 7 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆58Updated 6 months ago
- [INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by In…☆44Updated last year
- [ICLR 2025] Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes☆35Updated last week
- ☆71Updated last year
- Official Implementation and Dataset of paper - DFADD: The Diffusion and Flow-matching based Audio Deepfake Dataset☆12Updated last month
- Official repository for the paper Singing Voice Graph Modeling for SingFake Detection (Interspeech 2024).☆25Updated 7 months ago
- Source code for DM-Codec.☆41Updated 6 months ago
- Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix☆64Updated this week
- TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking☆16Updated 3 weeks ago
- [ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"☆83Updated 6 months ago
- ☆24Updated 7 months ago
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆131Updated 5 months ago
- A deepfake audio dataset for detecting fake speech from codec-based speech synthesis systems, Interspeech 2024☆17Updated 9 months ago
- ☆19Updated 2 years ago
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆36Updated last month
- UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound☆118Updated 2 months ago
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆65Updated 3 weeks ago
- small audio language model for reasoning☆63Updated last month
- ☆11Updated 7 months ago
- The dataset and baseline code for Text-to-Audio Grounding (TAG)☆42Updated 4 months ago
- ☆24Updated 10 months ago
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆67Updated 8 months ago
- ☆72Updated last month
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)☆38Updated 11 months ago
- This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.☆32Updated last year