☆114May 13, 2025Updated 9 months ago
Alternatives and similar repositories for PretrainedSED
Users that are interested in PretrainedSED are comparing it to the libraries listed below
Sorting:
- ☆16Jun 12, 2025Updated 8 months ago
- ☆28Oct 17, 2024Updated last year
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 10 months ago
- Speech Resynthesis and Language Modeling☆27Jun 11, 2025Updated 8 months ago
- This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".☆158Aug 24, 2025Updated 6 months ago
- Prediction of sound event bounding boxes (SEBBs)☆32Aug 2, 2024Updated last year
- Variable Bitrate Residual Vector Quantization for Audio Coding☆51May 1, 2025Updated 10 months ago
- Onset-and-Offset-Aware Sound Event Detection☆21Feb 10, 2025Updated last year
- ☆22Jul 30, 2025Updated 7 months ago
- 5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs☆57Nov 19, 2025Updated 3 months ago
- [AAAI 2024] Code for CTX-vec2wav in UniCATS☆130Jun 11, 2024Updated last year
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆87Dec 20, 2024Updated last year
- Source code for Consistent ensemble distillation for audio tagging☆57Jun 12, 2025Updated 8 months ago
- ☆37Jul 4, 2024Updated last year
- ☆40Feb 18, 2026Updated 2 weeks ago
- Masked Modeling Duo: Towards a Universal Audio Pre-training Framework☆136Feb 23, 2026Updated last week
- Official implementation of MelHuBERT☆68Feb 21, 2026Updated last week
- Spherical residual vector quantization (SRVQ)☆31Aug 25, 2024Updated last year
- Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open …☆22Feb 7, 2026Updated 3 weeks ago
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…☆38Jan 6, 2024Updated 2 years ago
- ☆99Jan 19, 2026Updated last month
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- My vocoder experiments☆31Jul 26, 2025Updated 7 months ago
- Unofficial implementation of NANSY++ in Pytorch Lightning☆50Mar 11, 2024Updated last year
- Pushing the Limits of Zero-shot End-to-End Speech Translation☆26Dec 12, 2024Updated last year
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆26Aug 5, 2024Updated last year
- Extract phoneme-level timestamps from speeh audio.☆119Updated this week
- (R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.☆48Sep 4, 2023Updated 2 years ago
- The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…☆63Dec 26, 2025Updated 2 months ago
- A library built for easier audio self-supervised training, downstream tasks evaluation☆136Sep 25, 2025Updated 5 months ago
- Official Implementation of EnCLAP (ICASSP 2024)☆94Jun 2, 2024Updated last year
- ☆25Jan 24, 2023Updated 3 years ago
- ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation☆28Mar 10, 2024Updated last year
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆196Dec 13, 2024Updated last year
- [ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder☆12Mar 11, 2025Updated 11 months ago
- A neural speech codec based on discrete WavLM representations☆24Aug 28, 2024Updated last year
- Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models☆22Jul 10, 2024Updated last year
- LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM☆18May 17, 2024Updated last year
- Attention-Enhanced Short-Time Wiener Solution for Acoustic Echo Cancellation☆24Nov 12, 2025Updated 3 months ago