my-yy / vfal_papersView external linksLinks
Voice Face Association Learning Paper List
☆17May 20, 2023Updated 2 years ago
Alternatives and similar repositories for vfal_papers
Users that are interested in vfal_papers are comparing it to the libraries listed below
Sorting:
- Official implementation of SBNet as described in "Single-branch Network for Multimodal Training".☆12Aug 28, 2023Updated 2 years ago
- [IJCAI2022] Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast☆21Oct 25, 2023Updated 2 years ago
- ☆12Jun 14, 2022Updated 3 years ago
- ☆11Nov 5, 2025Updated 3 months ago
- ☆19Mar 2, 2024Updated last year
- Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization (ACM MM 2024)☆22Jul 25, 2024Updated last year
- Pytorch implementation of our paper: Audio-Visual Speech Separation with Visual Features Enhanced by Adversarial Training.☆18Jul 11, 2022Updated 3 years ago
- ASCL: adpative Soft Contrastive Learning (ICPR2022)☆22Mar 22, 2025Updated 10 months ago
- Code for Audio-Visual Target Speaker Extraction with Selective Auditory Attention (TASLP)☆29Feb 28, 2025Updated 11 months ago
- Voice-Face Association Learning Evaluation☆49Feb 13, 2024Updated 2 years ago
- Download and preprocess voxceleb datasets.☆38Jun 18, 2025Updated 7 months ago
- Tools for downloading VoxCeleb2 dataset☆33Mar 16, 2024Updated last year
- Proton density fat fraction calculation for MRI☆11Jul 2, 2025Updated 7 months ago
- This branch of Asteroid contains code for the vocal harmony and chamber ensemble separation related papers.☆12Nov 7, 2024Updated last year
- Frequency tracking in time-frequency representations☆13Jan 19, 2021Updated 5 years ago
- ICASSP 2022: 'Self-supervised Speaker Recognition with Loss-gated Learning'☆92May 29, 2023Updated 2 years ago
- ☆42Nov 22, 2024Updated last year
- This repository contains the official code for "Flexible Biometrics Recognition: Bridging the Multimodality Gap through Attention, Alignm…☆12Oct 9, 2024Updated last year
- Time frequency ridge detection based on relevant ridge portions☆11Aug 17, 2023Updated 2 years ago
- This repository contains the speaker labeled information of VoxCeleb2 and LRS3 audio-visual datasets. (AAAI 2025)☆12Sep 6, 2024Updated last year
- AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in th…☆11Feb 23, 2024Updated last year
- Code and data recipes for the paper: Optimal Condition Training for Target Source Separation by Efthymios Tzinis, Gordon Wichern, Paris S…☆14Feb 15, 2023Updated 3 years ago
- ICASSP 2023: 'Speaker recognition with two-step multi-modal deep cleansing'☆44Oct 31, 2022Updated 3 years ago
- 针对CN-Celeb数据集的基于ECAPA-TDNN的说话人识别的pytorch实现☆13Apr 3, 2023Updated 2 years ago
- Human age estimation using deep neural networks (Keras)☆13Aug 10, 2023Updated 2 years ago
- Cross-Speaker Encoding Network for Multi-talker Speech Recognition☆11Mar 14, 2025Updated 11 months ago
- Speaker overlap-aware Neural Diarization☆12Feb 13, 2023Updated 3 years ago
- ☆10Nov 16, 2021Updated 4 years ago
- Reproducible research code for the experiments presented in our article "Kara1k: a karaoke dataset for cover song identification and sing…☆10Jan 9, 2018Updated 8 years ago
- [CVPR 2025] Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding☆16Oct 4, 2025Updated 4 months ago
- An exploration of LLM steering☆24Jun 15, 2024Updated last year
- Examples of how to use API of MVSep service☆28Jun 21, 2025Updated 7 months ago
- Python library for searching lyrics on Musixmatch, Genius and letras.mus.br.☆10Oct 10, 2024Updated last year
- ☆13Sep 26, 2023Updated 2 years ago
- 2023 Spring SNU Computer Vision Project☆14Jun 13, 2023Updated 2 years ago
- LLaVA-Next for STVG☆18Dec 5, 2025Updated 2 months ago
- A PyTorch implementation of Speech Transformer with multi-GPUs, an End-to-End ASR with Transformer network on Mandarin Chinese. This code…☆10Dec 25, 2019Updated 6 years ago
- [ICTC'24] - "Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet and Hybrid CNNs-BiLSTM Architecture" by Nhut Mi…☆10Jan 16, 2025Updated last year
- This is the code for controllable EVC framework for seen and unseen emotion generation.☆46Nov 3, 2021Updated 4 years ago