igormq/speech2text

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/igormq/speech2text)

igormq / speech2text

☆12

Alternatives and similar repositories for speech2text

Users that are interested in speech2text are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rafaelpadilla / TCF-LMO
View on GitHub
TCF-LMO is a network made with dedicated modules to process videos and identify the presence of anomalies in frames. It is composed by: d…
☆12Aug 28, 2024Updated last year
MuyangDu / T5Voice
View on GitHub
T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …
☆28Nov 7, 2025Updated 8 months ago
sotaque-brasileiro / sotaque-brasileiro
View on GitHub
Uma base de dados para estudo de regionalismos brasileiros através da voz.
☆11May 2, 2023Updated 3 years ago
lucasgris / wav2vec4bp
View on GitHub
Wav2vec resources and models for Brazilian Portuguese
☆36Jul 15, 2022Updated 4 years ago
octos-ai / octos-academy
View on GitHub
Treinamento da equipe Octos para novos/futuros membros da equipe e/ou interessados
☆14Jun 22, 2022Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
rmarcacini / ser-coraa-pt-br
View on GitHub
Emotion Recognition from Brazilian Portuguese Informal Spontaneous Speech
☆22Mar 21, 2022Updated 4 years ago
bryan051003 / USVG
View on GitHub
A unified model for zero-shot singing voice conversion and synthesis
☆22Nov 30, 2022Updated 3 years ago
mtreviso / deepbond
View on GitHub
Deep neural approach to Boundary and Disfluency Detection - Based on my Master's work
☆20Jul 25, 2024Updated last year
IS2AI / Kazakh_ASR
View on GitHub
☆16Aug 1, 2025Updated 11 months ago
KrishnaDN / E2E_ASR_Confidence_Estimation
View on GitHub
Implementation of the paper "Confidence estimation for attention based sequence to sequence models for speech recognition"
☆16May 9, 2021Updated 5 years ago
openaudiolab / LLaST
View on GitHub
LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models
☆26Aug 11, 2024Updated last year
freds0 / CML-TTS-Dataset
View on GitHub
CML-TTS: A Multilingual Dataset for Speech Synthesis
☆36Jul 31, 2024Updated last year
Pallas1303 / FestPB
View on GitHub
FestPB é um projeto com objetivo de oferecer suporte ao Português Brasileiro ao software Text-to-Speech Festival Speech Synthesis. Com op…
☆10May 5, 2024Updated 2 years ago
haewonc / LatProtRL
View on GitHub
[ICML 24] Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space
☆16Aug 9, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
zyascend / End-to-End-Speech-Recognition-Learning
View on GitHub
ASR, End-to-End, end2end, Speech Recognition, 端到端语音识别
☆12Oct 25, 2020Updated 5 years ago
rodrigokrosa / tacotron2-GL-brazillian-portuguese
View on GitHub
Repository to document results of an Tacotron 2 adaptation for brazilian portuguese.
☆17Sep 8, 2022Updated 3 years ago
shinhyeokoh / rwen
View on GitHub
☆14Jun 16, 2023Updated 3 years ago
JunyiPeng00 / SLT22_MultiHead-Factorized-Attentive-Pooling
View on GitHub
An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification
☆24Sep 22, 2024Updated last year
igormq / aes-lac-2018
View on GitHub
Pytorch code of "A new automatic speech recognizer for Brazilian Portuguese based on deep neural networks and transfer learning" submitte…
☆21Sep 30, 2019Updated 6 years ago
NoSavedDATA / Neve
View on GitHub
NSK Coding Language: Fast and Simple
☆15Jul 6, 2026Updated 2 weeks ago
khfs / DuplexMamba
View on GitHub
☆18Mar 6, 2026Updated 4 months ago
miccio-dk / NISQA
View on GitHub
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
☆16Apr 13, 2022Updated 4 years ago
WangHelin1997 / Aty-TTS
View on GitHub
Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speech
☆11May 14, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
lwang114 / UnsupTTS
View on GitHub
☆37Mar 26, 2024Updated 2 years ago
chutaklee / CantoASR
View on GitHub
Fine-tuning Wav2Vec2.0 on Common Voice(zh-HK)
☆16May 8, 2022Updated 4 years ago
kjw11 / CSEnet-ASR
View on GitHub
Cross-Speaker Encoding Network for Multi-talker Speech Recognition
☆12Mar 14, 2025Updated last year
IS2AI / Uzbek_ASR
View on GitHub
☆12Aug 9, 2021Updated 4 years ago
llm-lab-org / CLASP
View on GitHub
CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval
☆13Jun 27, 2025Updated last year
maum-ai / sane-tts
View on GitHub
SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech
☆11Jun 30, 2023Updated 3 years ago
freds0 / katube
View on GitHub
KATube is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. From a l…
☆26Jul 27, 2024Updated last year
freds0 / kabooks
View on GitHub
KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using a…
☆13Mar 24, 2023Updated 3 years ago
Edresson / Wav2Vec-Wrapper
View on GitHub
An easy way to fine-tune Wav2Vec 2.0 for low-resource languages.
☆80May 20, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ikushlianski / football-score-simulator-2017
View on GitHub
The app simulates football matches. Choose team names, relative strength, home crowd support, tactics and other factors to try to accurat…
☆16Mar 17, 2024Updated 2 years ago
pkufool / simple-wer
View on GitHub
A simple command line tool to calculate WER for ASR.
☆14Oct 14, 2024Updated last year
audiodemo / voice-conversion
View on GitHub
Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks
☆17Aug 18, 2023Updated 2 years ago
reppy4620 / convnext_tts
View on GitHub
Unofficial implementation of ConvNeXt-TTS powered by lightning
☆18Oct 20, 2024Updated last year
voidful / wav2vec2-xlsr-multilingual-56
View on GitHub
56 language, 1 model Multilingual ASR
☆24Jul 25, 2021Updated 4 years ago
ashi-ta / speechGLUE
View on GitHub
SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.
☆13Jun 2, 2023Updated 3 years ago
jiasenlu / YOLOv3.pytorch
View on GitHub
Pytorch implementation of Yolo V3
☆11Aug 30, 2018Updated 7 years ago