XiaoMi / dasheng
Official PyTorch code for Deep Audio-Signal Holistic Embeddings
☆78Updated last month
Alternatives and similar repositories for dasheng:
Users that are interested in dasheng are comparing it to the libraries listed below
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆58Updated last month
- Official data preparation scripts for the URGENT 2024 Challenge☆77Updated 2 months ago
- ☆33Updated 4 years ago
- The baseline system for the ICASSP2024 ICMC-ASR Challenge.☆49Updated last year
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆123Updated 3 months ago
- Source code for Consistent ensemble distillation for audio tagging☆29Updated 8 months ago
- Training data simulation☆47Updated 10 months ago
- COG-MHEAR Audio-Visual Speech Enhancement Challenge☆34Updated last week
- SpEx+(tied) source code☆80Updated last year
- The official source code of UniAudio☆91Updated last year
- Audio-FLAN☆140Updated 3 weeks ago
- ☆37Updated this week
- ☆43Updated 2 years ago
- The code for the Interspeech paper "Speaker Embedding Extraction with Phonetic Information"☆45Updated 5 years ago
- Code for the Interspeech 2024 paper "MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting"☆26Updated 3 months ago
- Reference-aware automatic speech evaluation toolkit☆144Updated 3 months ago
- ADAPTING SELF-SUPERVISED MODELS TO MULTI-TALKER SPEECH RECOGNITION USING SPEAKER EMBEDDINGS☆28Updated 2 years ago
- ☆27Updated 2 years ago
- AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data☆30Updated last year
- Uformer: A Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation☆102Updated 2 years ago
- This is the repo of the manuscript "Embedding and Beamforming: All-Neural Causal Beamformer for Multichannel Speech Enhancement", which w…☆85Updated 2 years ago
- Conferencing Speech Challenge☆90Updated 3 years ago
- multi-scale time domain speaker extraction☆61Updated 3 years ago
- ☆50Updated last year
- Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.☆71Updated last year
- The official repo: "McNet: Fuse Multiple Cues for Multichannel Speech Enhancement", ICASSP 2023☆111Updated 2 years ago
- Code for calculate DNS_MOS.☆35Updated 2 years ago
- BAE-NET: A LOW COMPLEXITY AND HIGH FIDELITY BANDWIDTH-ADAPTIVE NEURAL NETWORK FOR SPEECH SUPER-RESOLUTION☆68Updated 7 months ago
- ☆57Updated 10 months ago
- ☆54Updated 10 months ago