Code for ICASSP 2024 paper WhisperSeg: Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection
☆41Jul 25, 2025Updated 10 months ago
Alternatives and similar repositories for WhisperSeg
Users that are interested in WhisperSeg are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A repository for code used to produce the results the ICASSP 2024 paper: "SELF-SUPERVISED PRETRAINING FOR ROBUST PERSONALIZED VOICE ACTIV…☆24Nov 25, 2024Updated last year
- Tr-VAD: An Efficient Transformer based Voice Activity Detection Model☆18Aug 1, 2024Updated last year
- A library for viewing songbird brain atlases (European starling, Canary, Zebra finch, Pigeon, Mustached bat)☆23Sep 10, 2019Updated 6 years ago
- Visualization and analysis tool for passive acoustic data☆20May 19, 2026Updated last week
- ZeroMQ mex bindings for MATLAB☆21Feb 2, 2015Updated 11 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Audio Annotation Tool for ML development☆88May 15, 2026Updated 2 weeks ago
- Code for Interspeech2022 paper DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion☆13May 6, 2023Updated 3 years ago
- BioAcoustic Collection Pipeline☆65Updated this week
- Deep Audio Segmenter☆33Mar 15, 2026Updated 2 months ago
- This repository gathers the list of online publicly available bioacoustics datasets that can be used together with deep learning.☆42Jan 28, 2026Updated 4 months ago
- Implementation of the paper "Attentive Statistics Pooling for Deep Speaker Embedding" in Pytorch☆49Jun 4, 2020Updated 5 years ago
- This is the Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsuperv…☆155Jun 5, 2025Updated 11 months ago
- This is a demo project showing how to fine-tune and deploy the Whisper model on SageMaker.☆26Dec 20, 2023Updated 2 years ago
- denoising methods used in animal vocalization denoising☆25Dec 3, 2025Updated 5 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- animal2vec: A self-supervised transformer for rare-event raw audio input☆31Dec 15, 2025Updated 5 months ago
- Pre-trained models for bioacoustic classification tasks☆65May 3, 2026Updated 3 weeks ago
- ChatTube: A Retrieval QA System to Youtube Videos☆10Jun 6, 2023Updated 2 years ago
- audio, NLP, ML with huggingface, nvidia/nemo, speechbrain☆11Sep 4, 2023Updated 2 years ago
- A scalable solution that simplifies the integration of ComfyUI for developers☆11Jul 15, 2024Updated last year
- Python Passive Acoustic Analysis tool for Passive Acoustic Monitoring (PAM)☆51May 19, 2026Updated last week
- A unified framework for Low-resource Audio Processing and Evaluation (SSL Pre-training and Downstream Fine-tuning)☆29Jul 9, 2024Updated last year
- Hybrid convolutional-recurrent neural networks for segmentation of birdsong and classification of elements☆56Feb 10, 2023Updated 3 years ago
- A Differentiable Acoustic Guitar Model for String-Specific Polyphonic Synthesis☆18Nov 16, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆13May 23, 2024Updated 2 years ago
- Welcome to my project. OpenPyVision is a real time videoMixer based on opencv and pyqt6.☆14Aug 22, 2024Updated last year
- Code and dataset for Polyglot Prompting: Multilingual Multitask Prompt Training.☆18Dec 7, 2022Updated 3 years ago
- project website for "depth sensing beyond LiDAR range"☆11Jul 28, 2020Updated 5 years ago
- Collection of notebooks exploring conv nets in detail.☆10Sep 14, 2017Updated 8 years ago
- Reproducible experimental protocols for multimedia (audio, video, text) database☆118Mar 1, 2026Updated 2 months ago
- Speaker Verification using Pytorch☆13May 23, 2024Updated 2 years ago
- A neural network framework for researchers studying acoustic communication☆91Mar 13, 2026Updated 2 months ago
- Transcribe desktop audio/computer audio in real-time and locally (Streaming ASR), using TorchAudio and Emformer-RNNT model for inference,…☆14May 7, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.☆12Dec 24, 2022Updated 3 years ago
- Vietnamese Punctuation Prediction using Pretrained Language Models☆14May 8, 2022Updated 4 years ago
- One command to start a streaming ASR server.☆12Oct 2, 2024Updated last year
- Fast Punctuation Restoration using Transformer Models for Vietnamese☆11Jun 10, 2022Updated 3 years ago
- It is fine-tune the GPT-Neo model for Thai language.☆12Jun 30, 2021Updated 4 years ago
- Thai-English transliteration dictionary☆18Jun 24, 2022Updated 3 years ago
- Implementation and Deployment of Multilingual Custom Keyword Spotting Running in Real-time on an Edge Device.☆11Apr 27, 2023Updated 3 years ago