IBM / MAX-Speech-to-Text-ConverterView external linksLinks
Converts spoken words into text form.
☆76Sep 17, 2025Updated 4 months ago
Alternatives and similar repositories for MAX-Speech-to-Text-Converter
Users that are interested in MAX-Speech-to-Text-Converter are comparing it to the libraries listed below
Sorting:
- Generate English-language text similar to the news articles in the One Billion Words data set.☆26Sep 17, 2025Updated 5 months ago
- PyTorch speech2text inference script for the NVidia openseq2seq wav2letter model variant☆10Aug 12, 2019Updated 6 years ago
- Generate a summarized description of a body of text☆27Sep 17, 2025Updated 4 months ago
- Identify objects in an image, additionally assigning each pixel of the image to a particular object☆31Sep 17, 2025Updated 5 months ago
- A library of speech gadgets.☆14Oct 15, 2022Updated 3 years ago
- ☆12Aug 25, 2017Updated 8 years ago
- Experiment with "one-shot learning" techniques to recognize a voice signature☆24Mar 29, 2020Updated 5 years ago
- Image classifier for physical places/locations, based on the Places365-CNN Model☆42Sep 17, 2025Updated 5 months ago
- Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.☆13Feb 13, 2021Updated 5 years ago
- Train a neural network component that can add spatial transformations such as translation and rotation to larger models.☆10Apr 18, 2019Updated 6 years ago
- This repo contains the baseline model recipes and pre-trained model for GramVanni hindi ASR challenge☆15Mar 26, 2022Updated 3 years ago
- Evaluation of STT models for german language☆15Jan 22, 2022Updated 4 years ago
- CoaT: Co-Scale Conv-Attentional Image Transformers☆16Apr 20, 2021Updated 4 years ago
- Generate a new image that mixes the content of a source image with the style of another image.☆51Sep 17, 2025Updated 5 months ago
- ☆22Dec 31, 2025Updated last month
- FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability☆15Nov 2, 2024Updated last year
- Identify objects in images using a first-generation deep residual network.☆15Sep 17, 2025Updated 5 months ago
- Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.☆16Nov 1, 2021Updated 4 years ago
- Protect communications with adversarial neural cryptography.☆11Oct 31, 2018Updated 7 years ago
- Real-time speech enhancement based on spectral subtraction☆16Feb 18, 2018Updated 7 years ago
- This repository provides data and code for "Vox Populi, Vox DIY: Benchmark Dataset for Crowdsourced Audio Transcription" paper.☆16Jul 22, 2021Updated 4 years ago
- 🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)☆223Jun 15, 2020Updated 5 years ago
- Experiments with Hugging Face 🔬 🤗☆46Aug 20, 2024Updated last year
- Adapt Kaldi-ASR nnet3 chain models from Zamia-Speech.org to a different language model☆33Jan 26, 2020Updated 6 years ago
- Identify sounds in short audio clips☆156Sep 17, 2025Updated 5 months ago
- Simple Kaldi model server for chain (nnet3) models in online recognition mode directly from a local microphone☆35Feb 18, 2022Updated 3 years ago
- From a large speech audio file and its corresponding body of text, automatically chunk the audio and text into (phrase, audio_snippet) pa…☆17May 15, 2015Updated 10 years ago
- Running Mozilla's implementation of Baidu DeepSpeech on Google Colaboratory☆16Mar 18, 2019Updated 6 years ago
- Losses and decoders for end-to-end ASR and OCR☆34Oct 30, 2020Updated 5 years ago
- A Python interface to OpenFst (fix FstDrawer interface issue for 1.6 version)☆17Apr 2, 2018Updated 7 years ago
- Detect emotion from audio☆13Nov 20, 2018Updated 7 years ago
- NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates [WIP]☆25Jul 5, 2022Updated 3 years ago
- wake word spotting with kaldi☆19Dec 3, 2020Updated 5 years ago
- A handy dataset of noises for ASR☆22May 29, 2019Updated 6 years ago
- Vim Speech Recognition Experiments☆20May 30, 2025Updated 8 months ago
- Date of Concorde In January of the year 1976 after 29 years of the first to penetrate to the speed of sound military aircraft jet - Two C…☆10Feb 11, 2017Updated 9 years ago
- Korean ASR Corpus generated from TEDx talks☆27Jan 11, 2019Updated 7 years ago
- HMM, CTC, RNN-Transducer, forward-backward algorithm☆20Sep 5, 2023Updated 2 years ago
- Code Smell Detector able to detect a set of 16 Android-specific design flaws☆24Nov 13, 2019Updated 6 years ago