Code for ICASSP 2024 paper WhisperSeg: Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection
☆41Jul 25, 2025Updated 10 months ago
Alternatives and similar repositories for WhisperSeg
Users that are interested in WhisperSeg are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A repository for code used to produce the results the ICASSP 2024 paper: "SELF-SUPERVISED PRETRAINING FOR ROBUST PERSONALIZED VOICE ACTIV…☆24Nov 25, 2024Updated last year
- Tr-VAD: An Efficient Transformer based Voice Activity Detection Model☆18Aug 1, 2024Updated last year
- A core package for acoustic communication research in Python☆43Feb 24, 2026Updated 3 months ago
- Simple python algorithms for segmenting animal (songbird, mice) vocalizations into notes and syllables using Dynamic Thresholding and Con…☆27Apr 12, 2021Updated 5 years ago
- BioAcoustic Collection Pipeline☆67Jun 5, 2026Updated last week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Deep Audio Segmenter☆33Mar 15, 2026Updated 3 months ago
- This repository gathers the list of online publicly available bioacoustics datasets that can be used together with deep learning.☆43May 26, 2026Updated 3 weeks ago
- Pytorch implementation of "spectro-temporal attention-based voice activity detection"☆13Jun 4, 2024Updated 2 years ago
- Implementation of the paper "Attentive Statistics Pooling for Deep Speaker Embedding" in Pytorch☆49Jun 4, 2020Updated 6 years ago
- This is the Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsuperv…☆154Jun 5, 2025Updated last year
- 3D Sound Source Localization using Masked Autoencoders☆20Feb 12, 2025Updated last year
- Fork of Liu Feng's CoverHunter to run on a single computer, plus more features and documentation.☆24May 10, 2026Updated last month
- This is a demo project showing how to fine-tune and deploy the Whisper model on SageMaker.☆26Dec 20, 2023Updated 2 years ago
- acoss: Audio Cover Song Suite is a framework for feature extraction and benchmarking for the cover song identification (CSI) task☆39Jul 6, 2023Updated 2 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- denoising methods used in animal vocalization denoising☆25Dec 3, 2025Updated 6 months ago
- Pre-trained models for bioacoustic classification tasks☆66May 3, 2026Updated last month
- audio, NLP, ML with huggingface, nvidia/nemo, speechbrain☆11Sep 4, 2023Updated 2 years ago
- A scalable solution that simplifies the integration of ComfyUI for developers☆11Jul 15, 2024Updated last year
- Silero VAD(ncnn): pre-trained enterprise-grade Voice Activity Detector.☆26Aug 21, 2024Updated last year
- Thesia is a Multi-track Spectrogram / Waveform viewer☆19Jun 11, 2026Updated last week
- A unified framework for Low-resource Audio Processing and Evaluation (SSL Pre-training and Downstream Fine-tuning)☆29Jul 9, 2024Updated last year
- Offline Speaker Diarization with SenseVoice by Sherpa ONNX.☆15Dec 23, 2024Updated last year
- ☆13May 23, 2024Updated 2 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Welcome to my project. OpenPyVision is a real time videoMixer based on opencv and pyqt6.☆14Aug 22, 2024Updated last year
- Code and dataset for Polyglot Prompting: Multilingual Multitask Prompt Training.☆18Dec 7, 2022Updated 3 years ago
- Reproducible experimental protocols for multimedia (audio, video, text) database☆119Mar 1, 2026Updated 3 months ago
- Octopus is a neural machine generation toolkit for Arabic Natural Lnagauge Generation (NLG)☆10Apr 29, 2024Updated 2 years ago
- Speaker Verification using Pytorch☆13May 23, 2024Updated 2 years ago
- Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.☆12Dec 24, 2022Updated 3 years ago
- A Dataset for Cover Song Identification and Understanding☆65Feb 23, 2023Updated 3 years ago
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆96Oct 18, 2023Updated 2 years ago
- ☆45Dec 15, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Real Time Chat Application☆14Dec 20, 2022Updated 3 years ago
- It is fine-tune the GPT-Neo model for Thai language.☆12Jun 30, 2021Updated 4 years ago
- Self-Supervised Speech/Sound Pre-training and Representation Learning Toolkit☆13Nov 18, 2022Updated 3 years ago
- proof of concept conversation orchestrator with a speech-language model☆20Oct 19, 2024Updated last year
- state-of-the-art models for diacritics restoration for Arabic language☆16Feb 23, 2025Updated last year
- Transcribe desktop audio/computer audio in real-time and locally (Streaming ASR), using TorchAudio and Emformer-RNNT model for inference,…☆14May 7, 2024Updated 2 years ago
- Implementation and Deployment of Multilingual Custom Keyword Spotting Running in Real-time on an Edge Device.☆11Apr 27, 2023Updated 3 years ago