mkunes/w2v2_audioFrameClassification

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mkunes/w2v2_audioFrameClassification)

mkunes / w2v2_audioFrameClassification

wav2vec2 audio classification for prosodic boundary detection and other tasks

☆42

Alternatives and similar repositories for w2v2_audioFrameClassification

Users that are interested in w2v2_audioFrameClassification are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zqs01 / data2vecnoisy
View on GitHub
☆11Oct 20, 2022Updated 3 years ago
FrenchKrab / IS2023-powerset-diarization
View on GitHub
Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.
☆96Oct 18, 2023Updated 2 years ago
SSTC-Challenge / SSTC2024_baseline_system
View on GitHub
☆12Jun 14, 2024Updated 2 years ago
LONGXUANX / nano-whisper
View on GitHub
A demo-level low-latency, high-throughput inference engine for whisper
☆20Nov 9, 2025Updated 8 months ago
Maokui-He / NSD-MA-MSE
View on GitHub
A pytorch implementation of the paper "ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding"
☆62Sep 19, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
desh2608 / dover-lap
View on GitHub
Python package for combining diarization system outputs.
☆94Oct 12, 2023Updated 2 years ago
DongKeon / Awesome-Speaker-Diarization
View on GitHub
Some comprehensive papers about speaker diarization
☆367Mar 24, 2026Updated 3 months ago
DDATT / Vits2-onnx-cpp
View on GitHub
Simple inference for Vits2 TTS Using ONNXRUNTIME and espeak-ng on C++
☆19Apr 17, 2024Updated 2 years ago
yucongzh / online_speaker_diarization
View on GitHub
☆15Jul 11, 2022Updated 4 years ago
VoxBlink / ScriptsForVoxBlink
View on GitHub
A repo containing download guidance and corresponding scripts of the VoxBlink dataset.
☆30Apr 16, 2024Updated 2 years ago
BUTSpeechFIT / diacorrect
View on GitHub
Error correction back-end for speaker diarization
☆18Sep 26, 2023Updated 2 years ago
daandouwe / ngram-lm
View on GitHub
A simple n-gram language model.
☆12Sep 11, 2018Updated 7 years ago
egruttadauria98 / SSpaVAlDo
View on GitHub
☆37Jan 6, 2026Updated 6 months ago
BUTSpeechFIT / DVBx
View on GitHub
Discriminative Training of VBx Diarization
☆28Sep 23, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
seanghay / vits.cpp
View on GitHub
VITS Inference using ONNX Runtime on C++
☆13Dec 25, 2023Updated 2 years ago
BUTSpeechFIT / DiariZen
View on GitHub
A toolkit for speaker diarization.
☆500May 29, 2026Updated last month
liyunlongaaa / AD-TUNING
View on GitHub
AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in th…
☆11Feb 23, 2024Updated 2 years ago
wonjune-kang / llm-speech-summarization
View on GitHub
Prompting Large Language Models with Audio for General-Purpose Speech Summarization
☆20May 14, 2025Updated last year
lovemefan / fsmn-vad
View on GitHub
A enterprise-grade Voice Activity Detector from modelscope and funasr.
☆139Apr 26, 2023Updated 3 years ago
tango4j / llm_speaker_tagging
View on GitHub
SLT 2024 Challenge: Post-ASR-Speaker-Tagging
☆16Jun 16, 2024Updated 2 years ago
sp-uhh / sgmse_crp
View on GitHub
☆32Jan 9, 2024Updated 2 years ago
RicherMans / Datadriven-GPVAD
View on GitHub
The codebase for Data-driven general-purpose voice activity detection.
☆93Aug 3, 2023Updated 2 years ago
merlresearch / tssep
View on GitHub
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings
☆43Oct 27, 2025Updated 8 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
BUTSpeechFIT / VBx
View on GitHub
Variational Bayes HMM over x-vectors diarization
☆287Jan 15, 2024Updated 2 years ago
the-bird-F / GLM-Voice-RAG
View on GitHub
[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…
☆31Jul 11, 2025Updated last year
ddlBoJack / MT4SSL
View on GitHub
[INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by In…
☆45Mar 25, 2024Updated 2 years ago
Hunterhuan / sphereface2_speaker_verification
View on GitHub
Exploring Binary Classification Loss for Speaker Verification
☆18Jul 18, 2023Updated 3 years ago
manoskary / MusGConv
View on GitHub
Implementation and experiment of the MusGConv paper.
☆15Sep 6, 2024Updated last year
desh2608 / diarizer
View on GitHub
Clustering-based methods for overlapping diarization
☆84Jan 12, 2024Updated 2 years ago
usc-sail / peft-ser
View on GitHub
[ACII 2023] PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Spe…
☆60Jul 1, 2024Updated 2 years ago
Yaoming95 / UniPunc
View on GitHub
The case study and multilingfual performance of ICASSP submission
☆24Sep 24, 2022Updated 3 years ago
whzikaros / g2pL
View on GitHub
The implementation of g2pL with a new open dataset.
☆16May 14, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
lysanderism / TimeAudio
View on GitHub
The official repository TimeAudio, a comprehensive framework that incorporates fine-grained acoustic cues into LALMs with enhanced module…
☆30Nov 18, 2025Updated 8 months ago
ICASSP2021-tutorial9 / Distant_conversational_ASR_and_analysis
View on GitHub
☆12Jun 10, 2021Updated 5 years ago
jishengpeng / WavReward
View on GitHub
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
☆56May 15, 2025Updated last year
IDRnD / VoxTube
View on GitHub
The VoxTube dataset official repository
☆71Feb 14, 2024Updated 2 years ago
zhu-han / SpeechLLM
View on GitHub
LLM-based ASR recipe with Zipformer encoder and Qwen LLM
☆34Sep 25, 2025Updated 9 months ago
zhou-feifei / parakeet-tdt-0.6b-v2-Batch-Transcriber
View on GitHub
A high-performance batch audio transcription tool using nvidia/parakeet-tdt-0.6b-v2 to generate accurate, well-segmented SRT subtitles, w…
☆18Dec 9, 2025Updated 7 months ago
ductuantruong / speaker_age_estimation_ssl_study
View on GitHub
[APSIPA'22] Exploring Speaker Age Estimation on Different Self-Supervised Learning Models
☆14Oct 19, 2022Updated 3 years ago