sony/soundctm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sony/soundctm)

sony / soundctm

Pytorch implementation of SoundCTM

☆101

Alternatives and similar repositories for soundctm

Users that are interested in soundctm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yuhanghe01 / RiTTA
View on GitHub
Event Relation in Text-to-Audio (TTA) Generation
☆21Feb 26, 2025Updated last year
yangdongchao / LLM-Codec
View on GitHub
The open source code for LLM-Codec
☆147Aug 18, 2024Updated last year
sony / diffusion-timbre-transfer
View on GitHub
☆56Nov 5, 2024Updated last year
pengzhendong / streaming-vocos
View on GitHub
Streaming Vocos
☆31Jun 10, 2025Updated last year
bytedance / Make-An-Audio-2
View on GitHub
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
☆197May 29, 2024Updated 2 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
line / promptttspp
View on GitHub
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions
☆86Oct 11, 2024Updated last year
NVIDIA / audio-intelligence
View on GitHub
Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with syntheti…
☆137Mar 3, 2026Updated 4 months ago
Pliploop / GDRetriever
View on GitHub
Official implementation of the paper - GD-Retriever: Controllable generative text-music retrieval with diffusion models (Accepted at ISMI…
☆19Sep 25, 2025Updated 10 months ago
Berkeley-Speech-Group / sylber
View on GitHub
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
☆80Mar 17, 2025Updated last year
sh-lee-prml / PeriodWave
View on GitHub
The official Implementation of PeriodWave and PeriodWave-Turbo
☆225Apr 14, 2025Updated last year
SonyResearch / VRVQ
View on GitHub
Variable Bitrate Residual Vector Quantization for Audio Coding
☆54May 1, 2025Updated last year
glory20h / VoiceLDM
View on GitHub
VoiceLDM: Text-to-Speech with Environmental Context
☆194Aug 9, 2024Updated last year
yongyizang / AreYouReallyListening
View on GitHub
Official Repository for ISMIR 2025 paper "Are you really listening? Boosting Perceptual Awareness in Music-QA Benchmarks"
☆20Aug 18, 2025Updated 11 months ago
Sreyan88 / CompA
View on GitHub
Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
☆23Jul 10, 2024Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
ex3ndr / supervoice-hybrid
View on GitHub
My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one
☆26Aug 5, 2024Updated last year
facebookresearch / FlowDec
View on GitHub
An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.
☆212Jun 22, 2026Updated last month
AI-S2-Lab / FluentEditor
View on GitHub
[InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency
☆62Oct 23, 2024Updated last year
lavendery / AudioComposer
View on GitHub
☆27Sep 10, 2025Updated 10 months ago
yluo42 / SRVQ
View on GitHub
Spherical residual vector quantization (SRVQ)
☆31Aug 25, 2024Updated last year
SonyCSLParis / codicodec
View on GitHub
Encode and decode audio samples to/from continuous and discrete compressed representations!
☆121Nov 25, 2025Updated 8 months ago
habla-liaa / encodecmae
View on GitHub
Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'
☆101Jul 24, 2024Updated 2 years ago
cwitkowitz / ss-mpe
View on GitHub
Code for the paper "Toward Fully Self-Supervised Multi-Pitch Estimation".
☆25Sep 27, 2025Updated 9 months ago
Sreyan88 / GAMA
View on GitHub
Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
☆153Dec 5, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
jjunak-yun / FLowHigh_code
View on GitHub
[ICASSP 2025] "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"
☆118Jan 17, 2025Updated last year
qiuqiangkong / audioflow
View on GitHub
☆130Updated this week
DiffAPF / LA-2A
View on GitHub
Feed-forward compressor experiments source code for "Differentiable All-pole Filters for Time-varying Audio Systems".
☆24Jun 10, 2024Updated 2 years ago
p1an-lin-jung / wv_tts
View on GitHub
☆19Mar 22, 2024Updated 2 years ago
sony / timbre-trap
View on GitHub
Code for the paper "Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription"
☆43May 5, 2024Updated 2 years ago
archinetai / cqt-pytorch
View on GitHub
An invertible and differentiable implementation of the Constant-Q Transform (CQT).
☆73Dec 9, 2022Updated 3 years ago
Stability-AI / stable-audio-metrics
View on GitHub
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
☆300Updated this week
naver-ai / usdm
View on GitHub
Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)
☆95Dec 3, 2024Updated last year
suhitaghosh10 / emo-stargan
View on GitHub
Implementation of Emo-StarGAN
☆48Dec 19, 2023Updated 2 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
ilpoviertola / V-AURA
View on GitHub
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)
☆35Feb 11, 2026Updated 5 months ago
gwh22 / LAFMA
View on GitHub
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)
☆44Jun 13, 2024Updated 2 years ago
sp-nitech / diffsptk
View on GitHub
A differentiable version of SPTK
☆201Jul 14, 2026Updated last week
happylittlecat2333 / Auffusion
View on GitHub
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generati…
☆194Mar 25, 2024Updated 2 years ago
sh-lee97 / grafx
View on GitHub
GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch
☆139Jun 29, 2026Updated 3 weeks ago
haoheliu / SemantiCodec
View on GitHub
☆45Jun 11, 2024Updated 2 years ago
xiquan-li / MeanAudio
View on GitHub
[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
☆142Sep 2, 2025Updated 10 months ago