Visitor-W / MTDALinks

MTDA-HSED: Mutual-Assistance Tuning and Dual-Branch Aggregating for Heterogeneous Sound Event Detection

☆9

Alternatives and similar repositories for MTDA

Users that are interested in MTDA are comparing it to the libraries listed below

Sorting:

Saurabhbhati / DASS
☆11Updated 2 months ago
wangfangyuan / SChunk-Encoder
SChunk-Encoder (Transformer or Conformer) for streaming E2E ASR
☆9Updated 2 years ago
apple-yinhan / Noise-robust-SED
☆13Updated 6 months ago
hmohebbi / disentangling_representations
☆12Updated 9 months ago
dhimasryan / TMHINT-QI-VoiceMOS2023
☆17Updated last year
fgnt / speaker_reassignment
Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment
☆12Updated 5 months ago
tuanio / nextformer
PyTorch implementation of "Nextformer: A ConvNeXt Augmented Conformer For End-To-End Speech Recognition"
☆11Updated 2 years ago
Taltt / FNSE-SBGAN
FNSE-SBGAN: Far-field Speech Enhancement with Schrödinger Bridge and Generative Adversarial Networks
☆13Updated 2 months ago
xiaoxue1117 / speech-mamba-public
☆11Updated 7 months ago
SSTC-Challenge / SSTC2024_baseline_system
☆11Updated last year
YoshikiMas / madeon-asr
[SLT'24] Mamba-based Decoder-Only Approach for Speech Recognition
☆14Updated 7 months ago
onolab-tmu / libss
A Python library for blind source separation.
☆4Updated 3 months ago
michaelneri / unsupervised-audio-anomaly-detection
Official repository of the work "Low-complexity Unsupervised Audio Anomaly Detection exploiting Separable Convolutions and Angular Loss" …
☆11Updated 8 months ago
leduckhai / MultiMed-ST
MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation
☆13Updated 3 months ago
heungky / trainable-STFT-Mel
Understanding Audio Features via Trainable Basis Functions
☆9Updated 3 years ago
chaufanglin / Normal2Whisper
Implementation of "Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation"
☆11Updated 8 months ago
Honee-W / CPTNN
unofficial implementation of "CPTNN: CROSS-PARALLEL TRANSFORMER NEURAL NETWORK FOR TIME-DOMAIN SPEECH ENHANCEMENT"
☆15Updated last year
jh-cha-prml / JELLY
Code for the paper "JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis"
☆12Updated 8 months ago
kjw11 / CSEnet-ASR
Cross-Speaker Encoding Network for Multi-talker Speech Recognition
☆11Updated 4 months ago
IU-SAIGE / pse
Efficient Personalized Speech Enhancement through Self-Supervised Learning
☆21Updated 2 years ago
ZhaoF-i / ASTWS-AEC
Attention-Enhanced Short-Time Wiener Solution for Acoustic Echo Cancellation
☆16Updated 2 weeks ago
zds-potato / multilingual-phonetic-sv
☆9Updated last year
huaidanquede / Dense-TSNet
offical code for Dense-TSNet
☆12Updated 10 months ago
Speech-Arena / speech_df_arena
☆18Updated 3 months ago
prairie-schooner / wav2vec-vc
☆11Updated 2 years ago
BUTSpeechFIT / OOV-recovery-in-hybrid-ASR-system
☆9Updated 5 years ago
aispeech-lab / TinyWASE
PyTorch implementation of TinyWASE described in our paper "Compressing Speaker Extraction Model with Ultra-low Precision Quantization and…
☆11Updated 4 years ago
lexkoro / cfm-vc
☆11Updated 4 months ago
zjzser / WMCodec
PyTorch Implementation of [WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification](https://arxiv.or…
☆14Updated 8 months ago
haoxiangsnr / audioinfo
A small tool to calculate the distribution of audio durations in a directory
☆14Updated 2 years ago