matthijsvk / multimodalSRView external linksLinks
Multimodal speech recognition using lipreading (with CNNs) and audio (using LSTMs). Sensor fusion is done with an attention network.
☆69Nov 19, 2022Updated 3 years ago
Alternatives and similar repositories for multimodalSR
Users that are interested in multimodalSR are comparing it to the libraries listed below
Sorting:
- A fine multimodality fusion network :)☆11Aug 9, 2021Updated 4 years ago
- Speech recognition on the TIMIT (or any other) dataset☆44Nov 2, 2017Updated 8 years ago
- processing and extracting of face and mouth image files out of the TCDTIMIT database☆46Sep 22, 2020Updated 5 years ago
- Python toolkit for Visual Speech Recognition☆38Jun 10, 2020Updated 5 years ago
- 1st place solution to the DCASE 2020 - Task 5 - Urban Sound Tagging with Spatiotemporal Context☆16Dec 8, 2022Updated 3 years ago
- (semi) Grapheme-to-Phoneme (G2P) - seq2seq model using PyTorch for Korean☆23Dec 17, 2017Updated 8 years ago
- calling GNU Octave functions from the Julia language☆11Jan 31, 2025Updated last year
- The Audio Score Alignment Test dataset for Ottoman-Turkish makam music☆11Apr 20, 2017Updated 8 years ago
- Find how to pronounce words by breaking them up into their phones.☆24Jul 7, 2017Updated 8 years ago
- This repository contains code and data for "On the Multimodal Person Verification Using Audio-Visual-Thermal Data"☆12Apr 27, 2023Updated 2 years ago
- Pytorch code for End-to-End Audiovisual Speech Recognition☆184Nov 18, 2022Updated 3 years ago
- Using embedding-based loss functions for phonetics/speech recognition.☆17Nov 24, 2014Updated 11 years ago
- Dual cross modality attention audio-visual speech recognition model based on vgg transformer with hybrid CTC/attention architecture using…☆14Jul 2, 2020Updated 5 years ago
- Multimodal Speech Recognition for phoneme level prediction using Audio-Visual data from TCDTIMIT dataset implementing RNNs with LSTMs for…☆15Jul 27, 2023Updated 2 years ago
- Conformer encoder + Transformer decoder with Hybrid CTC/attention☆12Nov 11, 2021Updated 4 years ago
- Multimodal Emotion Recognition in a video using feature level fusion of audio and visual modalities☆15Jul 5, 2018Updated 7 years ago
- Audio-Visual Speech Recognition using Deep Learning☆61Nov 14, 2018Updated 7 years ago
- an tutorial implement of voice conversion using pytorch☆34Mar 30, 2018Updated 7 years ago
- End to End Multiview Lip Reading☆10Jan 26, 2018Updated 8 years ago
- Correspondence and autoencoder neural network training for speech using Pylearn2.☆14Dec 9, 2015Updated 10 years ago
- ABX and kaldi experiments on speech corpora made easy☆33Oct 7, 2024Updated last year
- Urban Sound Classification : striving towards a fair comparison☆17Dec 11, 2020Updated 5 years ago
- Keras implementation of 'LipNet: End-to-End Sentence-level Lipreading'☆681Nov 22, 2022Updated 3 years ago
- Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments☆111Mar 19, 2024Updated last year
- Mel-Generalized Cepstrum analysis☆20Jul 21, 2017Updated 8 years ago
- A dataset for chord coloring and voicing☆20Nov 2, 2020Updated 5 years ago
- Multilingual grapheme-to-phoneme conversion☆20Feb 23, 2018Updated 7 years ago
- A guide and set of tools for working with TinyML powered Audio Sensors☆20Sep 17, 2021Updated 4 years ago
- Audio-Visual Speech Recognition using Sequence to Sequence Models☆83Jul 10, 2020Updated 5 years ago
- audio cfeatures extraction tool from wav to h5features format☆19May 24, 2019Updated 6 years ago
- 基于CNN的音频识别☆17Feb 13, 2019Updated 7 years ago
- An end-to-end MATLAB toolkit for completely unsupervised Speaker Diarization using state-of-the-art algorithms.☆15Dec 22, 2015Updated 10 years ago
- Audio Visual Speech Recognition☆23Aug 9, 2017Updated 8 years ago
- Multimodal short video classification task, integrating video, image, audio and text modes for short video classification☆19Mar 12, 2020Updated 5 years ago
- Feature extraction of speech signal is the initial stage of any speech recognition system.☆97Sep 3, 2020Updated 5 years ago
- A collection of basic python modules for spoken natural language processing☆55Dec 1, 2019Updated 6 years ago
- 多模态数据融合:为了完成多模态数据融合,首先利用VGG16网络和cifar10数据集完成多输入网络的分类,在VGG16的基础之上,将前三层特征提取网络作为不同输入的特征提取网络,在中间层进行特征拼接,后面的卷积层用于提取融合特征,最后加上全连接层。该网络稍作修改就能同时提取…☆101Sep 25, 2020Updated 5 years ago
- Bidirectional dynamic RNN + CTC for phoneme recognition☆46Jun 24, 2020Updated 5 years ago
- Beamforming based binaural speech enhancement as a real time JUCE plugin☆28Apr 29, 2018Updated 7 years ago