IS2AI/SpeakingFaces

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/IS2AI/SpeakingFaces)

IS2AI / SpeakingFaces

A large-scale publicly-available visual-thermal-audio dataset designed to encourage research in the general areas of user authentication, facial recognition, speech recognition, and human-computer interaction.

☆88

Alternatives and similar repositories for SpeakingFaces

Users that are interested in SpeakingFaces are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Alpkant / Thermal-to-Visible-Face-Recognition-Using-Deep-Autoencoders
View on GitHub
Official repository of our BIOSIG19 paper "Thermal to Visible Face Recognition Using Deep Autoencoders"
☆35Jan 19, 2021Updated 5 years ago
KishanKancharagunta / PCSGAN
View on GitHub
PCSGAN: Perceptual Cyclic-Synthesized Generative Adversarial Networks for Thermal/NIR to Visible Image Transformation
☆13Feb 10, 2020Updated 6 years ago
yucongzh / online_speaker_diarization
View on GitHub
☆15Jul 11, 2022Updated 3 years ago
hujiecpp / pGAN
View on GitHub
The code of paper: Robust Face Sketch Synthesis via Generative Adversarial Fusion of Priors and Parametric Sigmoid (pGAN) [IJCAI 2018]
☆17Oct 15, 2019Updated 6 years ago
NVlabs / EoRA
View on GitHub
[ICLRW'26] EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
☆40Apr 21, 2026Updated 2 weeks ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
kamperh / globalphone_awe
View on GitHub
Multilingual acoustic word embedding approaches applied and evaluated on GlobalPhone data.
☆11Nov 3, 2020Updated 5 years ago
jasonppy / syllable-discovery
View on GitHub
Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model
☆35Aug 27, 2023Updated 2 years ago
leohuang2013 / pyannote-audio_overlapped-speech-detection_cpp
View on GitHub
C++ version of pyannote audio overlapped speech detection pipeline
☆13Feb 14, 2024Updated 2 years ago
HJ0Wang / FFE-CycleGAN
View on GitHub
Official code for paper 'FFE-CycleGAN: A specialized optimization method of CycleGAN for VIS-NIR Heterogeneous Face Recognition'
☆13Sep 23, 2021Updated 4 years ago
dr-pato / audio_visual_speech_enhancement
View on GitHub
Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments
☆111Mar 19, 2024Updated 2 years ago
kastnerkyle / diphone_synthesizer
View on GitHub
A tutorial diphone synthesizer in Python
☆25Nov 26, 2018Updated 7 years ago
Rudrabha / 8X-Super-Resolution
View on GitHub
This repository is a repository for the paper, "Irgun: Improved residue based gradual up-scaling network for single image super resolutio…
☆16Aug 26, 2020Updated 5 years ago
BUTSpeechFIT / diacorrect
View on GitHub
Error correction back-end for speaker diarization
☆18Sep 26, 2023Updated 2 years ago
suralmasha / RuTranscript
View on GitHub
Russian phonetical transcription
☆11Nov 19, 2025Updated 5 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
haoheliu / ontology-aware-audio-tagging
View on GitHub
☆14Nov 22, 2022Updated 3 years ago
iglaweb / HippoYD
View on GitHub
Mouth openness classifier trained with TensorFlow and video dataset YawDD
☆36May 3, 2021Updated 5 years ago
lelechen63 / Talking-head-Generation-with-Rhythmic-Head-Motion
View on GitHub
☆209Mar 10, 2021Updated 5 years ago
yichen14 / FastAdaSP
View on GitHub
Code for the paper "FastAdaSP: An Efficient Multitask Inference Framework for Large Speech Language Models". @ EMNLP'24(Oral)
☆17Nov 14, 2024Updated last year
tzyll / ChineseHP
View on GitHub
☆15Jul 4, 2024Updated last year
rithiksachdev / PostASR-Correction-SLT2024
View on GitHub
☆18Jul 22, 2024Updated last year
Koziev / StressModel
View on GitHub
Neural model for prediction of stress position in Russian words
☆13Jun 22, 2025Updated 10 months ago
cyhuang-tw / robust-vc
View on GitHub
☆11May 7, 2022Updated 3 years ago
panzhang0212 / CoCosNet
View on GitHub
☆11Jun 20, 2020Updated 5 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
2bbb / node-abletonlink-example
View on GitHub
example of node-abletonlink
☆19Dec 7, 2022Updated 3 years ago
backspacetg / distilXLSR
View on GitHub
Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model
☆13Mar 30, 2025Updated last year
pkufool / simple-wer
View on GitHub
A simple command line tool to calculate WER for ASR.
☆14Oct 14, 2024Updated last year
shashankshirol / GeneratingNoisySpeechData
View on GitHub
A repository comprising of code for generation of noisy speech data from clean data using deep learning methods
☆16Jul 12, 2021Updated 4 years ago
qiujiali / lattice-rescore
View on GitHub
☆16Jun 13, 2022Updated 3 years ago
uniBruce / Mead
View on GitHub
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]
☆301Jul 7, 2024Updated last year
ZiweiWangTHU / Quantformer
View on GitHub
This is the official pytorch implementation for the paper: *Quantformer: Learning Extremely Low-precision Vision Transformers*.
☆31Nov 14, 2022Updated 3 years ago
miccio-dk / NISQA
View on GitHub
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
☆16Apr 13, 2022Updated 4 years ago
ashi-ta / speechGLUE
View on GitHub
SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.
☆13Jun 2, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
jadfegh / audiovision
View on GitHub
Real-time Speech Separation, Noise Suppression & Speaker Recognition
☆17Apr 17, 2019Updated 7 years ago
patyork / AutomaticSpeechChunker
View on GitHub
From a large speech audio file and its corresponding body of text, automatically chunk the audio and text into (phrase, audio_snippet) pa…
☆17May 15, 2015Updated 10 years ago
smeetrs / deep_avsr
View on GitHub
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
☆243Feb 15, 2024Updated 2 years ago
funcwj / voice-filter
View on GitHub
A unofficial Pytorch implementation of Google's VoiceFilter
☆104Jul 6, 2023Updated 2 years ago
zelaki / DisfluentFA
View on GitHub
A Weakly Supervised Forced Alignment for disluent speech
☆15Nov 12, 2023Updated 2 years ago
JinhuaLiang / APT
View on GitHub
☆20Mar 12, 2025Updated last year
jasonppy / PromptingWhisper
View on GitHub
Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
☆151Jan 16, 2024Updated 2 years ago