☆12Apr 26, 2025Updated 10 months ago
Alternatives and similar repositories for DASS
Users that are interested in DASS are comparing it to the libraries listed below
Sorting:
- ☆10Apr 17, 2024Updated last year
- Avalinguo Audio Dataset: Dataset for Speaker Fluency Level Classification☆13Aug 13, 2018Updated 7 years ago
- Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.☆15May 16, 2025Updated 9 months ago
- Audio-visual diarization pipeline used for creating VoxConverse dataset☆21Jun 6, 2025Updated 9 months ago
- Word Discovery in Visually Grounded, Self-Supervised Speech Models☆26Dec 4, 2023Updated 2 years ago
- Temporary anonymous version☆22Mar 20, 2024Updated last year
- Official PyTorch implementation for "Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech …☆33May 11, 2025Updated 9 months ago
- This is the official implementation of reverberant speech to room impulse response estimator☆41Aug 7, 2024Updated last year
- Repository of the WACV'24 paper "Can CLIP Help Sound Source Localization?"☆34Feb 21, 2025Updated last year
- ☆37Jun 30, 2022Updated 3 years ago
- Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigat…☆56Jan 18, 2026Updated last month
- Public Code for the paper MAE-AST: Masked Autoencoding Audio Spectrogram Transformer☆91Jun 9, 2022Updated 3 years ago
- Official Implementation of Jointist☆37Jul 26, 2023Updated 2 years ago
- A python implementation of “Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Mult…☆39Oct 11, 2024Updated last year
- ☆37Mar 26, 2024Updated last year
- Russian phonetical transcription☆11Nov 19, 2025Updated 3 months ago
- [ICLR 2025] Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes☆57Oct 8, 2025Updated 4 months ago
- Whisper finetuning☆16Apr 9, 2025Updated 10 months ago
- Grapheme-to-phoneme tool for corpus conversion, where phonemes match Phoible inventories☆19Apr 10, 2025Updated 10 months ago
- eCMU: An Efficient Phase-aware Framework for Music Source Separation with Conformer (IEEE RIVF23)☆10Oct 30, 2024Updated last year
- Code for the paper "RIR-in-a-Box : Estimating Room Acoustics from 3D Mesh Data through Shoebox Approximation" presented at Interspeech 20…☆16Sep 1, 2024Updated last year
- A tool to collect/validate audio recordings from workers on Amazon Mechanical Turk. Written in Python/Flask. (originally hosted on github…☆14Dec 19, 2022Updated 3 years ago
- KittenTTS is an ultra-lightweight, CPU-friendly text-to-speech model with 15M params for real-time, high-quality voices. Open source, fas…☆23Updated this week
- SChunk-Encoder (Transformer or Conformer) for streaming E2E ASR☆11Oct 21, 2022Updated 3 years ago
- ☆13Oct 9, 2025Updated 4 months ago
- Official repository of the work "Low-complexity Unsupervised Audio Anomaly Detection exploiting Separable Convolutions and Angular Loss" …☆10Nov 6, 2024Updated last year
- A python script COMMAND LINE utility to AUTO GENERATE SUBTITLE FILE (using free Vosk Speech Recognition API) and TRANSLATED SUBTITLE FILE…☆11May 5, 2024Updated last year
- ☆11Aug 11, 2023Updated 2 years ago
- Learning an Interpretable End-to-End Network for Real-Time Acoustic Beamforming☆15Aug 20, 2024Updated last year
- Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.☆39Mar 4, 2024Updated 2 years ago
- Deepspeech/Coqui AI speech to text systems in Esperanto. - Parolrekoniloj en Esperanto uzante Deepspeech/Coqui Ai.☆10Jan 11, 2022Updated 4 years ago
- A Model (maybe an app) that translates the audio of a video from one language to another language, cloning the voice of original video wi…☆15May 19, 2025Updated 9 months ago
- Project for HIDING SPEAKER’S SEX IN SPEECH USING ZERO-EVIDENCE SPEAKER REPRESENTATION IN AN ANALYSIS/SYNTHESIS PIPELINE☆15Nov 30, 2022Updated 3 years ago
- 🎵 muse: Music Separation☆11Feb 14, 2024Updated 2 years ago
- Multilingual acoustic word embedding approaches applied and evaluated on GlobalPhone data.☆11Nov 3, 2020Updated 5 years ago
- Target speaker automatic speech recognition (TS-ASR)☆12Oct 14, 2023Updated 2 years ago
- Code for the paper: MACE: Leveraging Audio for Evaluating Audio Captioning Systems☆13Jan 16, 2025Updated last year
- ☆26Nov 3, 2025Updated 4 months ago
- Arabic Grapheme-to-Phoneme (G2P) Conversion☆13Mar 15, 2025Updated 11 months ago