koudounasalkis / voc2vecView external linksLinks
This repository contains the code for the paper "voc2vec: A Foundation Model for Non-Verbal Vocalization", accepted at ICASSP 2025.
☆47Apr 14, 2025Updated 10 months ago
Alternatives and similar repositories for voc2vec
Users that are interested in voc2vec are comparing it to the libraries listed below
Sorting:
- Event Relation in Text-to-Audio (TTA) Generation☆20Feb 26, 2025Updated 11 months ago
- PyTorch implementation of "Source Separation by Flow Matching (FLOSS)" by Google DeepMind☆91Nov 24, 2025Updated 2 months ago
- ☆15Apr 2, 2025Updated 10 months ago
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆31Aug 30, 2025Updated 5 months ago
- EMO-SUPERB submission☆50Oct 13, 2025Updated 4 months ago
- ☆15Aug 22, 2025Updated 5 months ago
- Unofficial implementation of ConvNeXt-TTS powered by lightning☆18Oct 20, 2024Updated last year
- Official Implementation of TSELM: Target speaker extraction using discrete tokens and language models☆55Apr 14, 2025Updated 10 months ago
- Train no-reference speech quality estimators with multiple datasets via learned, per-dataset alignments.☆18Aug 1, 2025Updated 6 months ago
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆45Feb 9, 2025Updated last year
- Official Repository for "Efficient Vocal Source Separation Through Windowed RoFormer"☆42Oct 30, 2025Updated 3 months ago
- Vox-Profile Benchmark☆67Sep 12, 2025Updated 5 months ago
- Onset-and-Offset-Aware Sound Event Detection☆20Feb 10, 2025Updated last year
- ☆11Nov 7, 2024Updated last year
- [ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder☆12Mar 11, 2025Updated 11 months ago
- ☆99Jan 19, 2026Updated 3 weeks ago
- ☆36Sep 6, 2025Updated 5 months ago
- Inference code for Interspeech 2025 paper, "LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec"☆35Oct 23, 2025Updated 3 months ago
- ☆15Nov 11, 2024Updated last year
- ☆15Nov 10, 2025Updated 3 months ago
- Official PyTorch implementation of (ICME2025 oral) "AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-…☆17Feb 1, 2026Updated 2 weeks ago
- ZIQI-Eval: A Music Evaluation Benchmark for Large Language Models☆16Jul 23, 2024Updated last year
- ☆22Jul 30, 2025Updated 6 months ago
- Official Implementation and Dataset of paper - DFADD: The Diffusion and Flow-matching based Audio Deepfake Dataset☆15Apr 7, 2025Updated 10 months ago
- Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" acc…☆77Jul 16, 2023Updated 2 years ago
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆239Dec 18, 2025Updated last month
- Code release for "TinySpeech: Attention Condensers for Deep Speech Recognition Neural Networks on Edge Devices"☆21Jun 7, 2025Updated 8 months ago
- Forced alignment decoder for Whisper.☆14Mar 13, 2024Updated last year
- TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loud…☆111Dec 20, 2024Updated last year
- [ICASSP 2025] "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"☆107Jan 17, 2025Updated last year
- ☆25Mar 6, 2024Updated last year
- ☆40Jul 15, 2025Updated 7 months ago
- The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…☆63Dec 26, 2025Updated last month
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆75Jan 25, 2026Updated 3 weeks ago
- ☆29Nov 4, 2025Updated 3 months ago
- pytorch model for contexless-phoneme prediction from speech audio☆30Oct 30, 2025Updated 3 months ago
- Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction (LLM-TSE)☆42Oct 13, 2023Updated 2 years ago
- ☆32Jan 6, 2022Updated 4 years ago
- E2E TTS using Conditional Flow Matching (Experimental*)☆71Nov 10, 2023Updated 2 years ago