JackSyu/Discriminative-Multi-modality-Speech-Recognition

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JackSyu/Discriminative-Multi-modality-Speech-Recognition)

JackSyu / Discriminative-Multi-modality-Speech-Recognition

TF code for our CVPR2020 paper "Discriminative Multi-modality Speech Recognition"

☆26

Alternatives and similar repositories for Discriminative-Multi-modality-Speech-Recognition

Users that are interested in Discriminative-Multi-modality-Speech-Recognition are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ahstarwab / Violence_Detection
View on GitHub
Online and real-time violence recognition
☆16Jul 5, 2022Updated 4 years ago
upskyy / Automatic-Speech-Recognition-Models
View on GitHub
End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
☆10Jan 21, 2022Updated 4 years ago
ms-dot-k / Multi-head-Visual-Audio-Memory
View on GitHub
PyTorch implementation of "Distinguishing Homophenes using Multi-Head Visual-Audio Memory" (AAAI2022)
☆27Mar 9, 2024Updated 2 years ago
mpc001 / Lipreading_using_Temporal_Convolutional_Networks
View on GitHub
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASS…
☆438May 18, 2023Updated 3 years ago
placebokkk / ctc-asr
View on GitHub
pytorch CTC implementation for ASR. Use eesen's fst decoder framework
☆10Feb 27, 2020Updated 6 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
xing96 / MIM-lipreading
View on GitHub
Code and model for paper <Mutual Information Maximization for Effective Lip Reading>
☆19Sep 4, 2020Updated 5 years ago
shleee47 / Sound-Source-Localization
View on GitHub
Sound Source Localization for AI Grand Challenge 2021
☆21Feb 7, 2022Updated 4 years ago
prajwalkr / transpotter
View on GitHub
Official implementation of Transpotter, published in BMVC 2021
☆16Aug 6, 2022Updated 3 years ago
ms-dot-k / LRW_ID
View on GitHub
The speaker-labeled information of LRW dataset, which is the outcome of the paper "Speaker-adaptive Lip Reading with User-dependent Paddi…
☆10Oct 12, 2023Updated 2 years ago
yunyikristy / global_local
View on GitHub
☆14Oct 7, 2021Updated 4 years ago
arxrean / LipRead-seq2seq
View on GitHub
An unofficial (PyTorch) implementation for the paper Deep Lip Reading: A comparison of models and an online application.
☆10May 13, 2020Updated 6 years ago
smeetrs / deep_avsr
View on GitHub
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
☆244Feb 15, 2024Updated 2 years ago
lvrysis / Audio-DNN-Classification
View on GitHub
Deep Neural Networks for audio classification
☆10Apr 11, 2024Updated 2 years ago
x-lixu / pyResults
View on GitHub
A tool for calculating WER (Word Error Rate) in python.
☆14Sep 18, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
DeepPSP / cinc2022
View on GitHub
Heart Murmur Detection from Phonocardiogram Recordings: The George B. Moody PhysioNet Challenge 2022
☆15Jan 6, 2026Updated 6 months ago
lilianemomeni / KWS-Net
View on GitHub
Seeing Wake Words: Audio-visual Keyword Spotting
☆67Sep 16, 2020Updated 5 years ago
prajwalkr / vtp
View on GitHub
Official Implementation of Visual Transformer Pooling for Lip reading
☆41Aug 8, 2022Updated 3 years ago
ms-dot-k / Visual-Audio-Memory
View on GitHub
PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)
☆22Apr 11, 2022Updated 4 years ago
gengxuelong / wenet_LLM_from_ASLP
View on GitHub
wenet_LLM_from_ASLP
☆15Nov 26, 2024Updated last year
LUMIA-Group / Leveraging-Self-Supervised-Learning-for-AVSR
View on GitHub
Official PyTorch implementation of paper Leveraging Unimodal Self Supervised Learning for Multimodal Audio-Visual Speech Recognition (ACL…
☆67Jul 13, 2022Updated 4 years ago
kooBH / drone-robust-gender-classification
View on GitHub
인명 구조용 드론을 위한 음성 화자 인지 기술
☆31Jan 31, 2023Updated 3 years ago
DataoceanAI / CNVSRC2023Baseline
View on GitHub
Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)
☆23Apr 27, 2024Updated 2 years ago
georgesterpu / avsr-tf1
View on GitHub
Audio-Visual Speech Recognition using Sequence to Sequence Models
☆84Jul 10, 2020Updated 6 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
mpc001 / end-to-end-lipreading
View on GitHub
Pytorch code for End-to-End Audiovisual Speech Recognition
☆183Nov 18, 2022Updated 3 years ago
MU94W / TTS-Eval
View on GitHub
☆18Aug 9, 2018Updated 7 years ago
VIPL-Audio-Visual-Speech-Understanding / LRW1000--CAS-VSR-W1k
View on GitHub
DenseNet3D Model In "LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild", https://arxiv.org/abs/1810.069…
☆123Mar 13, 2026Updated 4 months ago
ayushkumartarun / deep-regression-unlearning
View on GitHub
Official repo of the paper Deep Regression Unlearning accepted in ICML 2023
☆16Jun 14, 2023Updated 3 years ago
VIPL-Audio-Visual-Speech-Understanding / learn-an-effective-lip-reading-model-without-pains
View on GitHub
The PyTorch Code and Model In "Learn an Effective Lip Reading Model without Pains", (https://arxiv.org/abs/2011.07557), which reaches the…
☆168Sep 12, 2025Updated 10 months ago
kevaldoshi17 / NVIDIA_AICITY
View on GitHub
Repository for NVIDIA AICITY Challenge
☆15Jul 29, 2021Updated 5 years ago
ajinkyaT / Lip_Reading_in_the_Wild_AVSR
View on GitHub
Audio-Visual Speech Recognition using Deep Learning
☆61Nov 14, 2018Updated 7 years ago
IvonaTau / emotionaldan
View on GitHub
Deep Neural Network for joint emotion classification and landmark localization.
☆21Apr 2, 2019Updated 7 years ago
Jimmy-di / camouflage-poisoning
View on GitHub
Camouflage poisoning via machine unlearning
☆19Jul 3, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
DanielLin94144 / Test-time-adaptation-ASR-SUTA
View on GitHub
Test-time adaptation for speech recognition model by single utterance. The official implementation of "Listen, Adapt, Better WER: Source-…
☆23Apr 1, 2022Updated 4 years ago
uark-cviu / Right2Talk
View on GitHub
[ICCV'21] The Right to Talk: An Audio-Visual Transformer Approach
☆20Aug 2, 2021Updated 4 years ago
speechpro / mixup
View on GitHub
☆24Mar 13, 2020Updated 6 years ago
yangdongchao / DCASE2021Task5
View on GitHub
The code for DCASE2021 task5 submission.
☆20Feb 21, 2022Updated 4 years ago
Friedrich-M / Audio-signal-classification-and-identification
View on GitHub
基于梅尔频谱的信号分类和识别
☆23Mar 31, 2023Updated 3 years ago
afourast / deep_lip_reading
View on GitHub
Code and models for evaluating a state-of-the-art lip reading network
☆196Mar 24, 2023Updated 3 years ago
zjlww / ardit-web
View on GitHub
☆27Aug 2, 2024Updated last year