A large-scale publicly-available visual-thermal-audio dataset designed to encourage research in the general areas of user authentication, facial recognition, speech recognition, and human-computer interaction.
☆88Jul 24, 2025Updated 9 months ago
Alternatives and similar repositories for SpeakingFaces
Users that are interested in SpeakingFaces are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official repository of our BIOSIG19 paper "Thermal to Visible Face Recognition Using Deep Autoencoders"☆35Jan 19, 2021Updated 5 years ago
- PCSGAN: Perceptual Cyclic-Synthesized Generative Adversarial Networks for Thermal/NIR to Visible Image Transformation☆13Feb 10, 2020Updated 6 years ago
- ☆15Jul 11, 2022Updated 3 years ago
- The code of paper: Robust Face Sketch Synthesis via Generative Adversarial Fusion of Priors and Parametric Sigmoid (pGAN) [IJCAI 2018]☆17Oct 15, 2019Updated 6 years ago
- [ICLRW'26] EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation☆40Apr 21, 2026Updated 2 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Multilingual acoustic word embedding approaches applied and evaluated on GlobalPhone data.☆11Nov 3, 2020Updated 5 years ago
- Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model☆35Aug 27, 2023Updated 2 years ago
- C++ version of pyannote audio overlapped speech detection pipeline☆13Feb 14, 2024Updated 2 years ago
- Official code for paper 'FFE-CycleGAN: A specialized optimization method of CycleGAN for VIS-NIR Heterogeneous Face Recognition'☆13Sep 23, 2021Updated 4 years ago
- Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments☆111Mar 19, 2024Updated 2 years ago
- A tutorial diphone synthesizer in Python☆25Nov 26, 2018Updated 7 years ago
- This repository is a repository for the paper, "Irgun: Improved residue based gradual up-scaling network for single image super resolutio…☆16Aug 26, 2020Updated 5 years ago
- Error correction back-end for speaker diarization☆18Sep 26, 2023Updated 2 years ago
- Russian phonetical transcription☆11Nov 19, 2025Updated 5 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆14Nov 22, 2022Updated 3 years ago
- Mouth openness classifier trained with TensorFlow and video dataset YawDD☆36May 3, 2021Updated 5 years ago
- ☆209Mar 10, 2021Updated 5 years ago
- Code for the paper "FastAdaSP: An Efficient Multitask Inference Framework for Large Speech Language Models". @ EMNLP'24(Oral)☆17Nov 14, 2024Updated last year
- ☆15Jul 4, 2024Updated last year
- ☆18Jul 22, 2024Updated last year
- Neural model for prediction of stress position in Russian words☆13Jun 22, 2025Updated 10 months ago
- ☆11May 7, 2022Updated 3 years ago
- ☆11Jun 20, 2020Updated 5 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- example of node-abletonlink☆19Dec 7, 2022Updated 3 years ago
- Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model☆13Mar 30, 2025Updated last year
- A simple command line tool to calculate WER for ASR.☆14Oct 14, 2024Updated last year
- A repository comprising of code for generation of noisy speech data from clean data using deep learning methods☆16Jul 12, 2021Updated 4 years ago
- ☆16Jun 13, 2022Updated 3 years ago
- MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]☆301Jul 7, 2024Updated last year
- This is the official pytorch implementation for the paper: *Quantformer: Learning Extremely Low-precision Vision Transformers*.☆31Nov 14, 2022Updated 3 years ago
- NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment☆16Apr 13, 2022Updated 4 years ago
- SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.☆13Jun 2, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Real-time Speech Separation, Noise Suppression & Speaker Recognition☆17Apr 17, 2019Updated 7 years ago
- From a large speech audio file and its corresponding body of text, automatically chunk the audio and text into (phrase, audio_snippet) pa…☆17May 15, 2015Updated 10 years ago
- A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.☆243Feb 15, 2024Updated 2 years ago
- A unofficial Pytorch implementation of Google's VoiceFilter☆104Jul 6, 2023Updated 2 years ago
- A Weakly Supervised Forced Alignment for disluent speech☆15Nov 12, 2023Updated 2 years ago
- ☆20Mar 12, 2025Updated last year
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation☆151Jan 16, 2024Updated 2 years ago