mispchallenge/MISP-ICME-AVSR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mispchallenge/MISP-ICME-AVSR)

mispchallenge / MISP-ICME-AVSR

☆17

Alternatives and similar repositories for MISP-ICME-AVSR

Users that are interested in MISP-ICME-AVSR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dalision / IVA-Xception
View on GitHub
IVA-Xception model which can achieve high performance in identifying multiple birds from overlapping bird sounds recordings based on IVA …
☆16Oct 20, 2021Updated 4 years ago
sungnyun / avsr-temporal-dynamics
View on GitHub
(SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
☆13Oct 22, 2024Updated last year
david-gimeno / tailored-avsr
View on GitHub
Official source code for the paper "Tailored Design of Audio-Visual Speech Recognition Models using Branchformers"
☆15Feb 24, 2025Updated last year
sungnyun / cav2vec
View on GitHub
(ICLR 2025) Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
☆16Apr 29, 2025Updated last year
mispchallenge / MISP2021-AVSR
View on GitHub
repository for paper "Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis"
☆18Jun 17, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ahaliassos / usr2
View on GitHub
PyTorch implementation of USR 2.0 (ICLR 2026)
☆15Apr 3, 2026Updated 3 months ago
Exgc / AVMuST-TED
View on GitHub
☆24Mar 30, 2024Updated 2 years ago
joannahong / AV-RelScore
View on GitHub
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…
☆35Jun 20, 2023Updated 3 years ago
talhanai / wer-sigtest
View on GitHub
Script to perform statistical significance test between ASR hypotheses.
☆23Aug 13, 2017Updated 8 years ago
YasserdahouML / VSR_test_set
View on GitHub
WildVSR
☆22Dec 13, 2023Updated 2 years ago
ms-dot-k / AVSR
View on GitHub
PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scorin…
☆23Apr 3, 2024Updated 2 years ago
nguyenvulebinh / AVSRCocktail
View on GitHub
Audio-Visual Speech Recognition
☆26Jul 7, 2025Updated last year
Exgc / OpenSR
View on GitHub
The official implementation of OpenSR (ACL2023 Oral)
☆17Nov 29, 2023Updated 2 years ago
chaufanglin / Normal2Whisper
View on GitHub
Implementation of "Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation"
☆14Oct 31, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
XLearning-SCU / 2024-IJCV-LCNL
View on GitHub
☆12Feb 2, 2024Updated 2 years ago
Sreyan88 / LipGER
View on GitHub
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
☆19Jul 16, 2024Updated 2 years ago
huckiyang / Interspeech23-Tutorial-Para-Efficient-Cross-Modal-Tutorial
View on GitHub
Interspeech Tutorial - Resource Efficient and Cross-Modal Learning Toward Foundation Modeling
☆15Oct 9, 2023Updated 2 years ago
mispchallenge / misp2021_baseline
View on GitHub
☆29Jun 15, 2022Updated 4 years ago
YUCHEN005 / UniVPM
View on GitHub
Code for paper "Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition"
☆28Jun 21, 2023Updated 3 years ago
LUMIA-Group / ConceptLM
View on GitHub
Official Implementation of ConceptLM.
☆23Mar 18, 2026Updated 4 months ago
mispchallenge / MISP-2023-Challenge-Baseline
View on GitHub
☆25Jan 2, 2024Updated 2 years ago
JeongHun0716 / e-mvsr
View on GitHub
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation (ACM MM 2024)
☆20Mar 17, 2025Updated last year
TaoRuijie / AVCleanse
View on GitHub
ICASSP 2023: 'Speaker recognition with two-step multi-modal deep cleansing'
☆44Oct 31, 2022Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
roudimit / whisper-flamingo
View on GitHub
Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation
☆210Jul 29, 2025Updated 11 months ago
jmandel / fun-with-formants
View on GitHub
Speech formant tracking code in Python
☆15Oct 10, 2013Updated 12 years ago
XLearning-SCU / 2024-TIP-CREAM
View on GitHub
PyTorch implementation for Cross-modal Retrieval with Noisy Correspondence via Consistency Refining and Mining (TIP 2024)
☆22Mar 25, 2024Updated 2 years ago
XLearning-SCU / 2023-ICCV-COMMON
View on GitHub
This repo contains the code and data of "Graph Matching with Bi-level Noisy Correspondence".
☆20Jul 28, 2023Updated 2 years ago
jcsilva / multilingual-g2p
View on GitHub
Multilingual Grapheme to Phoneme
☆50Feb 23, 2016Updated 10 years ago
choijeongsoo / lip2speech-unit
View on GitHub
[Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units
☆47Oct 26, 2024Updated last year
prajwalkr / transpotter
View on GitHub
Official implementation of Transpotter, published in BMVC 2021
☆16Aug 6, 2022Updated 3 years ago
YUCHEN005 / GILA
View on GitHub
Code for paper "Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition"
☆18Jun 21, 2023Updated 3 years ago
gustavo-beck / wavebender-gan
View on GitHub
☆25Sep 27, 2022Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
georgesterpu / avsr-tf1
View on GitHub
Audio-Visual Speech Recognition using Sequence to Sequence Models
☆84Jul 10, 2020Updated 6 years ago
suhaspillai / Speech-Recognition-Impaired-Speech
View on GitHub
Speech Recognition for speakers with speech disorders due to diseases like Cerebral Palsy, Parkinson or Amyotrophic Lateral Sclerosis ALS…
☆23Mar 26, 2017Updated 9 years ago
Yuanshi9815 / LiteFocus
View on GitHub
[Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.
☆34Mar 11, 2025Updated last year
bot-ssttkkl / nonebot-plugin-mahjong-utils
View on GitHub
Nonebot麻将小工具插件，支持手牌分析、番符点数查询
☆11Apr 14, 2025Updated last year
Jiang-Yidi / TS-TalkNet
View on GitHub
INTERSPEECH2023: Target Active Speaker Detection with Audio-visual Cues
☆61May 29, 2023Updated 3 years ago
uark-cviu / Right2Talk
View on GitHub
[ICCV'21] The Right to Talk: An Audio-Visual Transformer Approach
☆20Aug 2, 2021Updated 4 years ago
XLearning-SCU / 2023-CVPR-FCMI
View on GitHub
☆21Jun 3, 2023Updated 3 years ago