satsuki2486441738 / FusionAudioLinks

Towards Fine-grained Audio Captioning with Multimodal Contextual Cues

☆22

Alternatives and similar repositories for FusionAudio

Users that are interested in FusionAudio are comparing it to the libraries listed below

Sorting:

JusperLee / Gull-Codec-Training
☆13Updated 2 months ago
yangdongchao / ALMTokenizer2
The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…
☆26Updated 2 weeks ago
thuhcsi / SnakeGAN
Please visit https://thuhcsi.github.io/SnakeGAN/
☆37Updated 2 years ago
pengzhendong / streaming-vocos
Streaming Vocos
☆26Updated 4 months ago
ETH-DISCO / discoder
Official repo for DisCoder: High-Fidelity Music Vocoder using Neural Audio Codecs presented at ICASSP 2025
☆29Updated 3 months ago
ftshijt / Interspeech2024_DiscreteSpeechChallenge
This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.
☆32Updated last year
Yip-Jia-Qi / codecformer
☆17Updated 10 months ago
yongyizang / TrainingFreeMultiStepASR
Official Repository for "Training-Free Multi-Step Audio Source Separation"
☆35Updated last week
Mddct / transformer-vocos
☆28Updated 3 weeks ago
jiaqili3 / DualCodec
A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation
☆33Updated this week
fundwotsai2001 / Text-to-music-dataset-preparation
A repo that builds text to music datasets from scratch
☆21Updated 2 weeks ago
streichgeorg / autosing
☆12Updated 4 months ago
ryota-komatsu / speech_resynth
Speech Resynthesis and Language Modeling
☆17Updated this week
yuhanghe01 / RiTTA
Event Relation in Text-to-Audio (TTA) Generation
☆19Updated 3 months ago
Ereboas / MagiCodec
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
☆41Updated this week
ex3ndr / supervoice-librilight-preprocessed
60k hours of phoneme-aligned audio from audio books
☆18Updated 10 months ago
p1an-lin-jung / wv_tts
☆19Updated last year
XiaoyuBIE1994 / SDCodec
(ICASSP 2025) Learning Source Disentanglement in Neural Audio Codec
☆33Updated 3 weeks ago
Sreyan88 / CompA
Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
☆18Updated 10 months ago
exercise-book-yq / FreeCodec
FREECODEC: A DISENTANGLED NEURAL SPEECH CODEC WITH FEWER TOKENS
☆20Updated 9 months ago
MTG / SingWithExpressions
This is the accompanying repository to the paper - Automatic Estimation of Singing Voice Musical Dynamics
☆13Updated 7 months ago
asuni / PitchSqueezer
A robust pitch tracker using synchro-squeezed fft and frequency domain autocorrelation
☆34Updated last year
lucadellalib / discrete-wavlm-codec
A neural speech codec based on discrete WavLM representations
☆24Updated 9 months ago
xinshengwang / robpitch
A pitch detection model trained to be robust against noise and reverberation environments.
☆25Updated 4 months ago
amphionspace / tts-evaluation
An evaluation set for large-scale trained TTS models (Coming in Sep 2024)
☆12Updated 9 months ago
huutuongtu / Lightvoc
LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM
☆16Updated last year
Audio-Foundation-Models / ConversationTTS
☆64Updated this week
Mddct / simple-tts
（WIP）long form speech generatoins
☆31Updated 2 months ago
jisang93 / VISinger
Unofficial pytorch implementation of VISinger: Variational Inference with Adversarial Learning for End-to-end Singing Voice Synthesis (IC…
☆15Updated 2 years ago
yluo42 / SRVQ
Spherical residual vector quantization (SRVQ)
☆28Updated 9 months ago