roger-tseng / av-superb
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
☆51Updated 11 months ago
Alternatives and similar repositories for av-superb:
Users that are interested in av-superb are comparing it to the libraries listed below
- 🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)☆45Updated last month
- ☆39Updated 2 years ago
- 《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm☆80Updated last year
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆119Updated 3 months ago
- ☆46Updated 2 years ago
- DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning☆46Updated last year
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆111Updated 2 months ago
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT☆39Updated 6 months ago
- [AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS☆63Updated 4 months ago
- This repository presents an evaluation framework for speech-to-speech (S2S) models, following the methodology described in the EmphAsses …☆14Updated last year
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆30Updated 4 months ago
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆56Updated last month
- The open source code for LLM-Codec☆132Updated 7 months ago
- The open source code for SimpleSpeech series☆133Updated 5 months ago
- Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)☆53Updated 8 months ago
- Source code for the paper 'Audio Captioning Transformer'☆53Updated 3 years ago
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆51Updated 4 months ago
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)☆38Updated 9 months ago
- This package aims at simplifying the download of the AudioCaps dataset.☆32Updated last year
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆53Updated 4 months ago
- Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization☆169Updated 8 months ago
- Pytorch implementation for “V2C: Visual Voice Cloning”☆31Updated 2 years ago
- ☆29Updated 3 months ago
- Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995☆71Updated 3 months ago
- ☆22Updated 4 years ago