soham97/sound_ai_progress

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/soham97/sound_ai_progress)

soham97 / sound_ai_progress

Tracking states of the arts and recent results (bibliography) on sound tasks.

☆33

Alternatives and similar repositories for sound_ai_progress

Users that are interested in sound_ai_progress are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

microsoft / NoAudioCaptioning
View on GitHub
Repository for "Training Audio Captioning Models without Audio"
☆10Sep 26, 2023Updated 2 years ago
ccoreilly / deepspeech-catala
View on GitHub
Deepspeech ASR Model for the Catalan Language
☆17Feb 15, 2021Updated 5 years ago
line / WaveTrainerFit
View on GitHub
Official implementation of "Wave-Trainer-Fit: Neural Vocoder with Trainable Prior and Fixed-Point Iteration towards High-Quality Speech G…
☆16Feb 6, 2026Updated 5 months ago
microsoft / AudioEntailment
View on GitHub
Audio Entailment: Deductive Reasoning for Audio Understanding
☆17Dec 10, 2024Updated last year
soham97 / ADIFF
View on GitHub
Explaining audio differences using language
☆16Feb 11, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
JusperLee / Gull-Codec-Training
View on GitHub
☆12Mar 11, 2025Updated last year
thu-spmi / SPMILM
View on GitHub
A SPMI Lab toolkit for language models.
☆11Apr 12, 2017Updated 9 years ago
h-munakata / Lighthouse-Wrapper-for-Audio-Moment-Retrieval
View on GitHub
☆13Mar 23, 2026Updated 4 months ago
sarulab-speech / spatial_voice_conversion
View on GitHub
Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals
☆18Aug 8, 2024Updated last year
ftshijt / speech_evaluation
View on GitHub
A toolkit dedicate for speech evaluation.
☆23Sep 26, 2024Updated last year
roudimit / Omni-R1
View on GitHub
[ASRU 2025] Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
☆47Nov 21, 2025Updated 8 months ago
Chung-I / youtube-asr-crawler
View on GitHub
☆10Sep 19, 2022Updated 3 years ago
soham97 / PAM
View on GitHub
PAM is a no-reference audio quality metric for audio generation tasks
☆77Jul 19, 2024Updated 2 years ago
maxrmorrison / pypar
View on GitHub
Phoneme alignment representation compatible with multiple forced aligners
☆22Apr 12, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
soham97 / awesome-sound_event_detection
View on GitHub
Reading list for research topics in Sound AI
☆201Aug 8, 2024Updated last year
Emrys365 / se-scaling
View on GitHub
Model configurations for scaling SE models in the paper "Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enha…
☆41Aug 7, 2024Updated last year
rhoposit / icassp2021
View on GitHub
☆15May 8, 2021Updated 5 years ago
soham97 / MTL_Weakly_labelled_audio_data
View on GitHub
Code repo for "Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection"
☆17Nov 9, 2022Updated 3 years ago
snap-research / GenAU
View on GitHub
☆53Mar 24, 2026Updated 4 months ago
Aratako / CALM-DACVAE
View on GitHub
An attempt to reproduce CALM (Continuous Audio Language Models) using DACVAE as the audio VAE.
☆18Feb 20, 2026Updated 5 months ago
motazsaad / ara-pronunciation-tool
View on GitHub
A python tool that converts Arabic diacritised text to a sequence of phonemes and creates a pronunciation dictionary. This code is based …
☆15Sep 5, 2017Updated 8 years ago
sithu31296 / audio-tagging
View on GitHub
Easy to use Audio Tagging in PyTorch
☆23Aug 22, 2021Updated 4 years ago
yoongi43 / VRVQ
View on GitHub
Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"
☆11Apr 10, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Fraunhofer-IIS / ODAQ
View on GitHub
A collection of audio signals accompanied by corresponding subjective scores of perceived quality. Everything under permissive licenses.
☆53Feb 24, 2026Updated 5 months ago
idiap / zff_vad
View on GitHub
Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering
☆23Oct 19, 2023Updated 2 years ago
ckyang1124 / LALM-Evaluation-Survey
View on GitHub
Collection of works for evaluating (and analyzing) large audio-language models (LALMs)
☆41Aug 11, 2025Updated 11 months ago
ircam-cosima / soundworks-nu
View on GitHub
Use spectators smartphones as distributed speakers in live performances, soundworks - Max/MSP based framework
☆35Sep 15, 2021Updated 4 years ago
xjuspeech / YOLOPitch
View on GitHub
☆10Jun 11, 2024Updated 2 years ago
aispeech-lab / w2v-cif-bert
View on GitHub
☆37Jun 28, 2021Updated 5 years ago
iiscleap / DIHARD-2019-baseline
View on GitHub
☆16Mar 7, 2019Updated 7 years ago
Takaaki-Saeki / ssl_speech_restoration_v2
View on GitHub
☆17Dec 18, 2023Updated 2 years ago
Wataru-Nakata / ssl-vocoders
View on GitHub
Implementation of vocoders empowered with pytorch lightning
☆18Jan 27, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
janson9192 / autokws2021
View on GitHub
☆13Mar 25, 2021Updated 5 years ago
kgnlp / allophant
View on GitHub
A multilingual phoneme recognizer capable of generalizing zero-shot to unseen phoneme inventories.
☆30Mar 14, 2025Updated last year
huangruizhe / audio
View on GitHub
Data manipulation and transformation for audio signal processing, powered by PyTorch
☆10Sep 30, 2024Updated last year
Jessegator / SONAR
View on GitHub
☆38Oct 15, 2024Updated last year
andrewosh / peersockets
View on GitHub
Directly connect to peers via hyperswarm (or an API-compatible alternative).
☆13Jul 16, 2020Updated 6 years ago
Plachtaa / ASTRAL-quantization
View on GitHub
speaker-disentangled speech linguistic content quantizer
☆26Mar 19, 2025Updated last year
unza-speech-lab / zambezi-voice
View on GitHub
Repository for multilingual speech data resources for native languages of Zambia.
☆22Oct 9, 2024Updated last year