NINAnor / rare_species_detections
Repository for fine-tuning BEATs and using BEATs as feature extractor in a prototypical network. This repository has been used to complete the DCASE2023 challenge on few-shot bioacoustic events.
☆34Updated last month
Alternatives and similar repositories for rare_species_detections:
Users that are interested in rare_species_detections are comparing it to the libraries listed below
- A repo containing download guidance and corresponding scripts of the VoxBlink dataset.☆26Updated last year
- ☆55Updated 2 years ago
- Streaming Audiotransformers for online Audio tagging☆44Updated 10 months ago
- Pytorch implementation of "CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR".☆50Updated 3 weeks ago
- ☆45Updated last month
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆64Updated 2 weeks ago
- ☆51Updated 6 months ago
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆129Updated 4 months ago
- Official repository for Mamba-based Segmentation Model for Speaker Diarization☆35Updated 6 months ago
- Official implementation for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning☆89Updated 5 months ago
- The project is associated with the recently-launched INTERSPEECH 2025 Workshop on Multilingual Conversational Speech Language Model (MLC-…☆33Updated 3 weeks ago
- Source code for Consistent ensemble distillation for audio tagging☆30Updated 9 months ago
- SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge☆13Updated 10 months ago
- ☆47Updated 7 months ago
- Analysis of XLS-R for Speech Quality Assessment☆13Updated 2 months ago
- Learning differentiable temporal resolution on time-series data.☆36Updated 2 years ago
- [ICLR 2025] Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes☆34Updated 2 weeks ago
- Official implementation of DNSMOS Pro (accepted at INTERSPEECH 2024).☆38Updated last month
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆59Updated 3 weeks ago
- AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data☆31Updated last year
- ☆30Updated 10 months ago
- SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)☆71Updated 3 months ago
- ☆43Updated 10 months ago
- Contains the code associated with the ICLR submission for our text-to-speech diffusion model☆53Updated last year
- Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'☆119Updated last month
- Survey on speech generation work.☆18Updated last year
- TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings☆31Updated 7 months ago
- Clustering-based methods for overlapping diarization☆81Updated last year
- BAE-NET: A LOW COMPLEXITY AND HIGH FIDELITY BANDWIDTH-ADAPTIVE NEURAL NETWORK FOR SPEECH SUPER-RESOLUTION☆67Updated 8 months ago
- PAM is a no-reference audio quality metric for audio generation tasks☆60Updated 9 months ago