PeiwenSun2000 / Both-Ears-Wide-Open
The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
β13Updated last month
Related projects β
Alternatives and complementary repositories for Both-Ears-Wide-Open
- [Official Implementation] Acoustic Autoregressive Modeling π₯β57Updated 2 months ago
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.β51Updated last month
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipelineβ66Updated 2 weeks ago
- Efficient synchronization from sparse cuesβ28Updated 6 months ago
- β35Updated last year
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregressionβ13Updated last month
- π¦ Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)β32Updated last month
- PAM is a no-reference audio quality metric for audio generation tasksβ49Updated 4 months ago
- Public Code for the paper MAE-AST: Masked Autoencoding Audio Spectrogram Transformerβ83Updated 2 years ago
- Source code for the paper 'Audio Captioning Transformer'β50Updated 2 years ago
- β47Updated last week
- β34Updated 5 months ago
- Implementation of the paper, T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis, acβ¦β26Updated 5 months ago
- Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)β51Updated 5 months ago
- Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)β51Updated 9 months ago
- This package aims at simplifying the download of the AudioCaps dataset.β30Updated 11 months ago
- Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTSβ35Updated last year
- [ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"β50Updated last week
- Findings of ACL 2023 | AlignSTS: a speech-to-singing (STS) model based on modality disentanglement and cross-modal alignmentβ65Updated 4 months ago
- The implementation of paper "SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody"β29Updated 11 months ago
- Official source code of the INTERSPEECH 2023 paper: "Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Moβ¦β19Updated last year
- β55Updated 11 months ago
- β44Updated 4 months ago
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERTβ38Updated 2 months ago
- Pytorch implementation for βV2C: Visual Voice Cloningββ30Updated last year
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistencyβ48Updated 3 weeks ago
- Official Implementation of EnCLAP (ICASSP 2024)β90Updated 5 months ago
- Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (β¦β58Updated 7 months ago
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"β82Updated 2 months ago
- β30Updated last year