choijeongsoo / ututLinks
[TASLP 2024] Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
☆30Updated 9 months ago
Alternatives and similar repositories for utut
Users that are interested in utut are comparing it to the libraries listed below
Sorting:
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆31Updated 3 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆59Updated 7 months ago
- EMO-SUPERB submission☆44Updated 9 months ago
- 《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm☆81Updated last year
- Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset☆27Updated 3 weeks ago
- The open source code for LLM-Codec☆135Updated 10 months ago
- ☆31Updated 7 months ago
- [CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation☆37Updated 9 months ago
- ☆43Updated 2 years ago
- [NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words☆50Updated last year
- AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in th…☆11Updated last year
- A benchmark to evaluate full-duplex spoken dialogue models on pause handling, backchanneling, turn-taking, and user interruptions.☆42Updated 2 weeks ago
- Vox-Profile Benchmark☆30Updated 2 weeks ago
- This repository presents an evaluation framework for speech-to-speech (S2S) models, following the methodology described in the EmphAsses …☆21Updated last year
- Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)☆57Updated last year
- [AAAI 2024] Code for CTX-vec2wav in UniCATS☆129Updated last year
- ☆19Updated 2 years ago
- Collection of works for evaluating (and analyzing) large audio-language models (LALMs)☆28Updated last week
- Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023☆12Updated last year
- WavReward: Spoken Dialogue Models With Generalist Reward Evaluators☆40Updated last month
- ☆33Updated 11 months ago
- ☆24Updated 6 months ago
- Survey on speech generation work.☆20Updated last year
- This repository follows papers and reports on discrete speech representation learning and speech tokenization methods for speech language…☆15Updated last year
- TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages☆16Updated last year
- This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.☆32Updated last year
- The dataset and baseline code for Text-to-Audio Grounding (TAG)☆42Updated 5 months ago
- A toolkit dedicate for speech evaluation.☆20Updated 9 months ago
- Towards a general language-audio model for computational paralinguistic tasks☆13Updated 6 months ago
- [ICASSP 2024] KNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels☆37Updated last year