b-sigpro/sed-hsmm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/b-sigpro/sed-hsmm)

b-sigpro / sed-hsmm

Onset-and-Offset-Aware Sound Event Detection

☆21

Alternatives and similar repositories for sed-hsmm

Users that are interested in sed-hsmm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

nttcslab / dcase2025_task4_baseline
View on GitHub
☆18Apr 16, 2026Updated 3 months ago
fschmid56 / PretrainedSED
View on GitHub
☆145May 13, 2025Updated last year
CPJKU / cpjku_dcase24
View on GitHub
☆29Oct 17, 2024Updated last year
donghoney0416 / DeepASA
View on GitHub
Official page of "DeepASA: An Object-Oriented Multi-Purpose Network for Auditory Scene Analysis"
☆26Apr 15, 2026Updated 3 months ago
shengcanxu / canoSpeech
View on GitHub
text to speech
☆10Mar 19, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
theMoro / DIRAugmentation
View on GitHub
Improving Recording Device Generalization using Impulse Response Augmentation
☆21Apr 24, 2025Updated last year
fschmid56 / cpjku_dcase23
View on GitHub
This repository contains the code of the CP JKU submission to DCASE23 Task 1 "Low-complexity Acoustic Scene Classification"
☆32Sep 18, 2023Updated 2 years ago
theMoro / EfficientSED
View on GitHub
☆22Jun 12, 2025Updated last year
merlresearch / sebbs
View on GitHub
Prediction of sound event bounding boxes (SEBBs)
☆35Aug 2, 2024Updated last year
BUTSpeechFIT / SOT-DiCoW
View on GitHub
Multi-talker ASR based on DiCoW with Serialized Output Training
☆21Sep 18, 2025Updated 10 months ago
popcornell / FastMSS
View on GitHub
☆32May 18, 2026Updated 2 months ago
sarulab-speech / SpatialCLAP
View on GitHub
☆19Oct 9, 2025Updated 9 months ago
ex3ndr / supervoice-hybrid
View on GitHub
My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one
☆26Aug 5, 2024Updated last year
Mddct / simple-tts
View on GitHub
（WIP）long form speech generatoins
☆30Apr 2, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
b-sigpro / neural-fcasa
View on GitHub
This is a repository of neural full-rank spatial covariance analysis with speaker activity (neural FCASA).
☆40Mar 12, 2025Updated last year
pengzhendong / audio-pipeline
View on GitHub
☆23Oct 17, 2024Updated last year
idiap / zff_vad
View on GitHub
Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering
☆23Oct 19, 2023Updated 2 years ago
KeiKinn / ParaCLAP
View on GitHub
Towards a general language-audio model for computational paralinguistic tasks
☆30Dec 14, 2024Updated last year
OptimusPrimus / tacos
View on GitHub
Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
☆16Oct 12, 2025Updated 9 months ago
uthree / ddsp-vocoder
View on GitHub
☆12Nov 7, 2024Updated last year
b-sigpro / spectral-feature-compression
View on GitHub
The source code for Input-Adaptive Spectral Feature Compression by Sequence Modeling for Source Separation published in IEEE TASLPRO.
☆18Jun 3, 2026Updated last month
anton-kashkin / hifi_vc
View on GitHub
☆25Jan 24, 2023Updated 3 years ago
inverse-ai / FINALLY-Speech-Enhancement
View on GitHub
FINALLY: Fast and universal speech enhancement model delivering studio-quality audio for a wide range of recordings.
☆28Apr 1, 2026Updated 3 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
mush42 / istft-onnx
View on GitHub
Export an ONNX graph that performs ISTFT. Designed for TTS models.
☆28Apr 23, 2024Updated 2 years ago
lavendery / UUG
View on GitHub
☆21Sep 14, 2025Updated 10 months ago
khfs / DuplexMamba
View on GitHub
☆18Mar 6, 2026Updated 4 months ago
yusunnny / CST-former
View on GitHub
CST-former: Transformer with Channel-Spectro-Temporal Attention for Sound Event Localization and Detection (ICASSP 2024)
☆39May 20, 2025Updated last year
apple-yinhan / TQ-SED
View on GitHub
☆24Mar 19, 2025Updated last year
nttrd-mdlab / wearable-seld-dataset
View on GitHub
☆10Feb 18, 2022Updated 4 years ago
v-nhandt21 / MusicVoiceConversion
View on GitHub
Sing any popular song with your voice
☆11Jul 10, 2022Updated 4 years ago
MuyangDu / T5Voice
View on GitHub
T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …
☆28Nov 7, 2025Updated 8 months ago
haoweilou / ParaStyleTTS
View on GitHub
This is the official code for ACM CIKM 2025 Paper: ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive …
☆59Dec 21, 2025Updated 7 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
xiquan-li / Resonate
View on GitHub
[INTERSPEECH 2026] Pre-training, SFT, DPO and GRPO for Text-to-Audio Generation
☆48Apr 17, 2026Updated 3 months ago
reppy4620 / convnext_tts
View on GitHub
Unofficial implementation of ConvNeXt-TTS powered by lightning
☆18Oct 20, 2024Updated last year
samsad35 / code-ancogen
View on GitHub
[ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
☆14Mar 11, 2025Updated last year
dieKarotte / Spatial-Omni
View on GitHub
☆28Jun 17, 2026Updated last month
BASHLab / OWL
View on GitHub
☆15May 25, 2026Updated 2 months ago
yukara-ikemiya / wavefit-pytorch
View on GitHub
PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.
☆70Jul 13, 2026Updated 2 weeks ago
LAION-AI / emotional-speech-annotations
View on GitHub
This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models
☆35Oct 13, 2024Updated last year