nhut-ngnn / Voice-Based-Age-and-Gender-RecognitonLinks
[ICTC'24] - "Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet and Hybrid CNNs-BiLSTM Architecture" by Nhut Minh Nguyen, Thanh Trung Nguyen, Hua Hiep Nguyen, Phuong-Nam Tran, Duc Ngoc Minh Dang
☆10Updated 11 months ago
Alternatives and similar repositories for Voice-Based-Age-and-Gender-Recogniton
Users that are interested in Voice-Based-Age-and-Gender-Recogniton are comparing it to the libraries listed below
Sorting:
- ☆48Updated 3 years ago
- MSP-Podcast Challenge Baseline Code for Interspeech 2025☆28Updated last year
- Estimating the Age, Height, and Gender of a speaker with their speech signal.☆14Updated 3 years ago
- ☆11Updated 2 months ago
- This repository contains the code for the paper "voc2vec: A Foundation Model for Non-Verbal Vocalization", accepted at ICASSP 2025.☆46Updated 8 months ago
- [SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model☆133Updated 2 months ago
- Repository of the WACV'24 paper "Can CLIP Help Sound Source Localization?"☆33Updated 10 months ago
- Read articles, explore effectiveness metrics for speech enhancement methodologies. Seamlessly integrate code implementations for better u…☆25Updated last year
- [ICLR 2025] Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes☆56Updated 3 months ago
- This package aims at simplifying the download of the AudioSet dataset.☆55Updated 5 months ago
- Collection of works for evaluating (and analyzing) large audio-language models (LALMs)☆40Updated 4 months ago
- ICSD Dataset☆40Updated 6 months ago
- Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"☆165Updated last year
- [INTERSPEECH 2025] Official code for "SEED: Speaker Embedding Enhancement Diffusion Model"☆54Updated 2 months ago
- A toolkit for researchers in the multimodal sound separation.☆16Updated 2 years ago
- A repo containing download guidance and corresponding scripts of the VoxBlink dataset.☆28Updated last year
- Official source code of the INTERSPEECH 2023 paper: "Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Mo…☆20Updated 2 years ago
- Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigat…☆52Updated last month
- ☆25Updated 5 months ago
- Implementation of the paper "wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations" in Pytorch.☆54Updated 2 years ago
- ☆13Updated 4 months ago
- Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023☆12Updated last year
- Dynamic vision-guided speaker embedding for audio-visual speaker diarization☆12Updated 3 years ago
- small audio language model for reasoning☆84Updated last month
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆48Updated last year
- An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"☆31Updated 2 years ago
- PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scorin…☆20Updated last year
- [ASRU 2025] Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?☆37Updated last month
- (ICASSP 2025, official code)FlowSE: Flow Matching-based Speech Enhancement☆84Updated 5 months ago
- Inference code for Interspeech 2025 paper, "LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec"☆34Updated 2 months ago