This repository documents Barry's journey in learning deep learning for speech processing. Here, you'll find scripts and code snippets related to environment setup, data preprocessing, speech frontend, speech recognition, voice conversion, speech synthesis, and more. Let's explore the fascinating world of speech processing together! 🚀🚀🚀
☆13Oct 8, 2025Updated 4 months ago
Alternatives and similar repositories for barry_speech_tools
Users that are interested in barry_speech_tools are comparing it to the libraries listed below
Sorting:
- A Chinese Expressive Long-dialogue Speech Dataset with Scripts☆21Nov 11, 2024Updated last year
- ☆27Sep 14, 2024Updated last year
- ☆16Jun 15, 2022Updated 3 years ago
- 中国科学院大学2023-2024课程(更新中)☆12Jan 12, 2026Updated last month
- The baselines of ARC-Challenge-Interspeech2026☆56Dec 1, 2025Updated 3 months ago
- This repository contains code for an acoustic simulation framework that can be used for acoustic/ultrasonic indoor positioning and/or dat…☆13May 7, 2024Updated last year
- This is the implementation of the manuscript "Learning General All-Neural Speech Enhancement based on Taylor's Approximation Theory", whi…☆14Nov 25, 2022Updated 3 years ago
- ☆18Aug 23, 2024Updated last year
- VOICOR: A Residual Iterative Voice Correction Framework for Monaural Speech Enhancement☆46Sep 12, 2024Updated last year
- The implementation of TaylorBeamformer, which is in submission to Interspeech2022☆48Jun 10, 2022Updated 3 years ago
- ☆24Feb 28, 2023Updated 3 years ago
- ☆25Sep 30, 2019Updated 6 years ago
- A STFT/iSTFT written up in PyTorch using 1D Convolutions☆32Jul 9, 2024Updated last year
- ☆15Sep 16, 2024Updated last year
- ☆10Oct 20, 2022Updated 3 years ago
- Hierarchical Vision Transformers for Disease Progression Detection in Chest X-Ray Images☆11Jan 11, 2024Updated 2 years ago
- An interpreter in C for the language brainfuck.☆10Apr 12, 2023Updated 2 years ago
- This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.☆49Jul 28, 2025Updated 7 months ago
- This is the accompanying repository to the paper - Automatic Estimation of Singing Voice Musical Dynamics☆15Oct 28, 2024Updated last year
- Third place of 2021 IEEE GRSS Data Fusion Contest: Track MSD☆10Mar 31, 2021Updated 4 years ago
- An unofficial implementation of Lite-RTSE, a cost-effective lite model for real-time speech enhancement☆14Nov 19, 2023Updated 2 years ago
- TDBRAIN EEG Database pre-processing code☆14May 8, 2024Updated last year
- ☆11Jul 14, 2021Updated 4 years ago
- CS336 作业 5 实现, 附加作业里面的 dpo/rlhf 也完成了, 消融实验分析也放在飞书文档里面了, 仅供参考☆21Sep 27, 2025Updated 5 months ago
- 开发成长路上☆10Dec 25, 2018Updated 7 years ago
- ☆12Apr 26, 2025Updated 10 months ago
- This is the public repository for SALSA-Lite features for polyphonic sound event localization and detection using microphone arrays.☆12Dec 3, 2021Updated 4 years ago
- ☆10Jun 24, 2021Updated 4 years ago
- ☆10Sep 2, 2024Updated last year
- ACM MM 2022 paper_AVQA: A Dataset for Audio-Visual Question Answering on Videos☆16Aug 17, 2023Updated 2 years ago
- Greifswald Sleep Stage Classifier - a deep-learning based EEG sleep stage classifier☆16Aug 22, 2025Updated 6 months ago
- Fairness-Aware Representation Learning by Suppressing Attribute-Class Associations☆12Dec 10, 2024Updated last year
- ☆38Jul 20, 2020Updated 5 years ago
- ☆52Sep 10, 2024Updated last year
- MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations☆33Oct 15, 2025Updated 4 months ago
- ☆11Mar 22, 2023Updated 2 years ago
- Implementation of "Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation"☆13Oct 31, 2024Updated last year
- Source code for "BLOOM-Net: Blockwise Optimization for Masking Networks Toward Scalable and Efficient Speech Enhancement"☆14Feb 13, 2022Updated 4 years ago
- useful things that work with NVIDIA NeMo library☆14Jan 20, 2024Updated 2 years ago