Ming-er / MGA-CLAPLinks
official implementation of MGA-CLAP (ACM MM 2024)
☆15Updated 7 months ago
Alternatives and similar repositories for MGA-CLAP
Users that are interested in MGA-CLAP are comparing it to the libraries listed below
Sorting:
- ☆13Updated last year
- This repository collects papers related to Speech Tokenizer.☆16Updated 7 months ago
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆30Updated 3 months ago
- Curated list for papers, codes and resources related to Text-to-Audio (TTA) Generation☆28Updated this week
- Collection of works for evaluating (and analyzing) large audio-language models (LALMs)☆23Updated last week
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆59Updated 7 months ago
- ☆23Updated 8 months ago
- [ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"☆83Updated 6 months ago
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆69Updated 9 months ago
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆40Updated 2 months ago
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆118Updated last week
- Source code for DM-Codec.☆43Updated this week
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆137Updated 5 months ago
- 🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)☆51Updated 3 months ago
- ☆80Updated last week
- [ICASSP2025] Official code for VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis☆16Updated last month
- [ICLR 2025] Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes☆41Updated this week
- Official repository for the paper Singing Voice Graph Modeling for SingFake Detection (Interspeech 2024).☆25Updated 8 months ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆46Updated 7 months ago
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆67Updated last month
- A deepfake audio dataset for detecting fake speech from codec-based speech synthesis systems, Interspeech 2024☆17Updated 10 months ago
- The demo page for ALMTokenizer☆48Updated last month
- small audio language model for reasoning☆64Updated last month
- The dataset and baseline code for Text-to-Audio Grounding (TAG)☆42Updated 4 months ago
- ☆64Updated last week
- Repository of the WACV'24 paper "Can CLIP Help Sound Source Localization?"☆31Updated 3 months ago
- This repository follows papers and reports on discrete speech representation learning and speech tokenization methods for speech language…☆15Updated last year
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025)☆27Updated 5 months ago
- ☆71Updated last year
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)☆38Updated 11 months ago