YUCHEN005/GILA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/YUCHEN005/GILA)

YUCHEN005 / GILA

Code for paper "Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition"

☆18

Alternatives and similar repositories for GILA

Users that are interested in GILA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

YUCHEN005 / MIR-GAN
View on GitHub
Code for paper "MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recogni…
☆16Jun 21, 2023Updated 3 years ago
YUCHEN005 / UNA-GAN
View on GitHub
Code for paper "Unsupervised Noise adaptation using Data Simulation"
☆14May 16, 2024Updated 2 years ago
YUCHEN005 / RATS-Channel-A-Speech-Data
View on GitHub
This is a public repository for RATS Channel-A Speech Data, which is a chargeable noisy speech dataset under LDC. Here we release its Log…
☆16Oct 22, 2022Updated 3 years ago
YUCHEN005 / UniVPM
View on GitHub
Code for paper "Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition"
☆28Jun 21, 2023Updated 3 years ago
YUCHEN005 / Gradient-Remedy
View on GitHub
Code for paper "Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition"
☆21May 24, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
YUCHEN005 / DPSL-ASR
View on GitHub
Code for paper "Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition"
☆44May 23, 2023Updated 3 years ago
Hypotheses-Paradise / Hypo2Trans
View on GitHub
Single-blind supplementary materials for NeurIPS 2023 submission
☆94Oct 30, 2024Updated last year
Hypotheses-Paradise / UADF
View on GitHub
☆17May 5, 2024Updated 2 years ago
YUCHEN005 / RobustGER
View on GitHub
Code for paper "Large Language Models are Efficient Learners of Noise-Robust Speech Recognition"
☆143May 8, 2024Updated 2 years ago
shikiw / Modality-Integration-Rate
View on GitHub
[ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…
☆113Jul 9, 2025Updated last year
YUCHEN005 / STAR-Adapt
View on GitHub
Code for paper "Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"
☆241May 24, 2024Updated 2 years ago
swagshaw / Rainbow-Keywords
View on GitHub
Rainbow Keywords - Official PyTorch Implementation
☆14Jun 27, 2024Updated 2 years ago
YUCHEN005 / GenTranslate
View on GitHub
Code for paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"
☆199Jul 22, 2024Updated 2 years ago
Sreyan88 / LipGER
View on GitHub
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
☆19Jul 16, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
wonjune-kang / expressive-speech-retrieval
View on GitHub
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
☆15Aug 18, 2025Updated 11 months ago
rithiksachdev / PostASR-Correction-SLT2024
View on GitHub
☆18Jul 22, 2024Updated 2 years ago
tzyll / ChineseHP
View on GitHub
Dataset for Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models in Interspeech 2024.
☆16Jul 4, 2024Updated 2 years ago
tango4j / llm_speaker_tagging
View on GitHub
SLT 2024 Challenge: Post-ASR-Speaker-Tagging
☆16Jun 16, 2024Updated 2 years ago
hfutmars / MGCL
View on GitHub
The complete codes of the paper "Multimodal Graph Contrastive Learning for Recommendation"
☆10Mar 20, 2023Updated 3 years ago
ms-dot-k / Visual-Audio-Memory
View on GitHub
PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)
☆22Apr 11, 2022Updated 4 years ago
lin9x / AV-Sepformer
View on GitHub
☆65Jun 28, 2023Updated 3 years ago
archiki / Robust-E2E-ASR
View on GitHub
This repository contains the code for our upcoming paper An Investigation of End-to-End Models for Robust Speech Recognition at ICASSP 20…
☆49Dec 25, 2024Updated last year
jreremy / conformer
View on GitHub
Pytorch implementation of conformer with with training script for end-to-end speech recognition on the LibriSpeech dataset.
☆29May 1, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
shirley-wu / daco
View on GitHub
[NeurIPS 2024 D&B Track] DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation
☆14Mar 5, 2025Updated last year
BriansIDP / AudioVisualLLM
View on GitHub
☆19May 19, 2024Updated 2 years ago
ryuuji06 / keyword-spotting
View on GitHub
In this repository, I implement a system for detecting specific spoken words in speech signal. When reading a speech signal, I detect not…
☆19Sep 27, 2021Updated 4 years ago
LiChenda / Multi-clue-TSE-data
View on GitHub
Data simulation scripts for paper "Target Sound Extraction with Variable Cross-modality Clues"
☆17May 19, 2023Updated 3 years ago
isadrtdinov / kws-attention
View on GitHub
Attention-based model for keywords spotting
☆19Aug 9, 2021Updated 4 years ago
prajwalkr / transpotter
View on GitHub
Official implementation of Transpotter, published in BMVC 2021
☆16Aug 6, 2022Updated 3 years ago
DataoceanAI / CNVSRC2023Baseline
View on GitHub
Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)
☆23Apr 27, 2024Updated 2 years ago
KrishnaDN / Keyword-Transformer
View on GitHub
Implementation of the paper "Keyword Transformer: A Self-Attention Model for Keyword Spotting"
☆23May 19, 2021Updated 5 years ago
burchim / AVEC
View on GitHub
[WACV 2023] Audio-Visual Efficient Conformer (AVEC) for Robust Speech Recognition
☆101Feb 21, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
mispchallenge / MISP-ICME-AVSR
View on GitHub
☆17Jan 1, 2024Updated 2 years ago
the-bird-F / GLM-Voice-RAG
View on GitHub
[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…
☆31Jul 11, 2025Updated last year
Alibaba-NLP / AISHELL-NER
View on GitHub
[ICASSP 2022] AISHELL-NER: Named Entity Recognition from Chinese Speech
☆26Apr 20, 2022Updated 4 years ago
swagshaw / TorchKWS
View on GitHub
Collection of PyTorch implementations of Spoken Keyword Spotting presented in research papers.
☆41Apr 5, 2024Updated 2 years ago
ahaliassos / raven
View on GitHub
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
☆82Feb 27, 2025Updated last year
NMS05 / Multimodal-Fusion-with-Attention-Bottlenecks
View on GitHub
☆42Nov 22, 2024Updated last year
enoche / DGVAE
View on GitHub
Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation with Interpretability, IEEE TMM
☆16Jun 3, 2025Updated last year