HarunoriKawano / Wav2vec2.0
View external linksLinks

Implementation of the paper "wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations" in Pytorch.

☆55

Alternatives and similar repositories for Wav2vec2.0

Users that are interested in Wav2vec2.0 are comparing it to the libraries listed below

Sorting:

usc-sail / trust-ser
View on GitHub
Trustworthy Speech Emotion Recognition
☆13May 22, 2023Updated 2 years ago
igormq / speech2text
View on GitHub
☆12Feb 9, 2021Updated 5 years ago
khanld / Wav2vec2-Pretraining
View on GitHub
Wav2vec 2.0 Self-Supervised Pretraining
☆58Feb 6, 2025Updated last year
Emotional-Text-to-Speech / pytorch-dc-tts
View on GitHub
Text to Speech with PyTorch (English and Mongolian)
☆12May 3, 2020Updated 5 years ago
hongfeixue / StutteringSpeechChallenge
View on GitHub
SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
☆12Jun 11, 2024Updated last year
upskyy / ContextNet
View on GitHub
PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INT…
☆38Feb 27, 2022Updated 3 years ago
eastonYi / wav2vec
View on GitHub
a simplified version of wav2vec(1.0, vq, 2.0) in fairseq
☆167Sep 21, 2020Updated 5 years ago
gardner-lab / find-audio
View on GitHub
Dynamic time warping for audio matching and alignment
☆22Mar 15, 2018Updated 7 years ago
HappyColor / Vesper
View on GitHub
A Compact and Effective Pretrained Model for Speech Emotion Recognition
☆53Jun 29, 2024Updated last year
keonlee9420 / Comprehensive-Tacotron2
View on GitHub
PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supp…
☆48Jul 31, 2023Updated 2 years ago
openaudiolab / LLaST
View on GitHub
LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models
☆25Aug 11, 2024Updated last year
JinhuaLiang / lam4fsl
View on GitHub
An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"
☆31May 31, 2023Updated 2 years ago
Labbeti / aac-metrics
View on GitHub
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
☆68Jul 19, 2025Updated 6 months ago
babe269 / performant
View on GitHub
A toolset for easy formant extraction and visualization from wav files and TTS models
☆33Sep 2, 2022Updated 3 years ago
usdot-fhwa-stol / carma-streets
View on GitHub
CARMA Streets is a component of CARMA ecosystem, which enables such a coordination among different transportation users. This component p…
☆11Aug 21, 2025Updated 5 months ago
pheepa / DCUnet
View on GitHub
Phase-aware speech enchancement with Deep Complex U-Net
☆135Mar 22, 2023Updated 2 years ago
zafarrafii / REPET-Python
View on GitHub
REPeating Pattern Extraction Technique (REPET) in Python for audio source separation: original REPET, REPET extended, adaptive REPET, REP…
☆33Feb 16, 2024Updated 2 years ago
hcy71o / SC-CNN
View on GitHub
SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker Text-to-Speech Systems
☆39Nov 1, 2023Updated 2 years ago
chummersone / pywavfile
View on GitHub
A lightweight library to read/write wave audio files to/from lists of native Python types.
☆12Jun 10, 2024Updated last year
Edresson / Wav2Vec-Wrapper
View on GitHub
An easy way to fine-tune Wav2Vec 2.0 for low-resource languages.
☆80May 20, 2023Updated 2 years ago
lijuncheng16 / AudioTaggingDoneRight
View on GitHub
experiments about AudioSet
☆43Jul 22, 2023Updated 2 years ago
upskyy / Squeezeformer
View on GitHub
PyTorch implementation of "Squeezeformer: An Efficient Transformer for Automatic Speech Recognition" (NeurIPS 2022)
☆149Nov 22, 2022Updated 3 years ago
WangHelin1997 / SpecAugment-plus
View on GitHub
A Pytorch implementation of the paper : SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification
☆34Jun 25, 2021Updated 4 years ago
hanyuewuxue / ICEEMDAN-for-MT
View on GitHub
The GitHub repository for the paper "Denoising Application of Magnetotelluric Low-Frequency Signal Processing"
☆11Feb 22, 2023Updated 2 years ago
hgamboa / novainstrumentation
View on GitHub
Supporting code for instrumentation courses at Universidade Nova de Lisboa - Faculdade de Ciência de Lisboa
☆16Oct 7, 2022Updated 3 years ago
NINAnor / rare_species_detections
View on GitHub
Repository for fine-tuning BEATs and using BEATs as feature extractor in a prototypical network. This repository has been used to complet…
☆34Dec 28, 2025Updated last month
PeterMS123 / KWS-DS-CNN-for-embedded
View on GitHub
Tensorflow training scripts for depthwise separable convolutional neural networks for keyword spotting, and C++ code for deployment.
☆39Apr 2, 2020Updated 5 years ago
mil-tokyo / bc_learning_sound
View on GitHub
Chainer implementation of between-class learning for sound recognition https://arxiv.org/abs/1711.10282
☆96Mar 27, 2018Updated 7 years ago
imadtoubal / Parkinson-s-Disease-Classification-from-Speech-Data
View on GitHub
Parkinson’s Disease Classification from Speech Data using multiple Machine Learning approaches. This was implemented using scikit-learn P…
☆14Feb 2, 2020Updated 6 years ago
wangfangyuan / SChunk-Encoder
View on GitHub
SChunk-Encoder (Transformer or Conformer) for streaming E2E ASR
☆11Oct 21, 2022Updated 3 years ago
johanazhu / demo
View on GitHub
我的前端 demo 合集
☆11Oct 11, 2024Updated last year
bytecell / slotminer
View on GitHub
Tool for slot extraction from text
☆15Oct 23, 2022Updated 3 years ago
yosefdayani / MV-RAG
View on GitHub
MV-RAG combines retrieval with multi-view generation to create accurate 3D-consistent visuals. By retrieving reference images and text, i…
☆23Nov 29, 2025Updated 2 months ago
Xinxi-Zhang / Re-MeanFlow
View on GitHub
☆43Dec 1, 2025Updated 2 months ago
xi-j / Mamba-ASR
View on GitHub
ConMamba for Automatic Speech Recognition
☆102Aug 12, 2024Updated last year
ditto-tts / ditto-tts.github.io
View on GitHub
Official Demo Page for DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer
☆38Feb 17, 2025Updated last year
wilkinghoff / sub-cluster-AdaCos
View on GitHub
Accompanying code for the paper Sub-Cluster AdaCos: Learning Representations for Anomalous Sound Detection.
☆10Jun 7, 2022Updated 3 years ago
b04901014 / FT-w2v2-ser
View on GitHub
Official implementation for the paper Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition
☆153Oct 26, 2021Updated 4 years ago
kehanlu / Mandarin-Wav2Vec2
View on GitHub
Pre-trained Wav2vec2.0 for Mandarin
☆43Oct 30, 2022Updated 3 years ago

HarunoriKawano / Wav2vec2.0View external linksLinks

Alternatives and similar repositories for Wav2vec2.0

HarunoriKawano / Wav2vec2.0
View external linksLinks