Sreyan88 / RECAPView external linksLinks
Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning
☆16Jun 23, 2024Updated last year
Alternatives and similar repositories for RECAP
Users that are interested in RECAP are comparing it to the libraries listed below
Sorting:
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…☆38Jan 6, 2024Updated 2 years ago
- Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models☆22Jul 10, 2024Updated last year
- semantic tokenizer for speech and music☆21Jul 6, 2025Updated 7 months ago
- Awesome Multimodal Fusion in Speech Emotion Recognition☆13Nov 11, 2025Updated 3 months ago
- Explaining audio differences using language☆16Feb 11, 2025Updated last year
- ☆10Oct 16, 2025Updated 3 months ago
- Cross-Speaker Encoding Network for Multi-talker Speech Recognition☆11Mar 14, 2025Updated 11 months ago
- A chinese singing voice dataset, professional male singer, 105 songs, 132 minutes☆11Oct 19, 2023Updated 2 years ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆48Jan 19, 2026Updated 3 weeks ago
- Official code for paper:"Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding"☆28Jan 28, 2026Updated 2 weeks ago
- Accompanying code for paper "Attention-Based Contextual Language Model Adaptation for Speech Recognition", submitted to ACL 2021.☆14Jul 25, 2023Updated 2 years ago
- ☆14Mar 25, 2023Updated 2 years ago
- DWFormer: Dynamic Window Transformer for Speech Emotion Recognition(ICASSP 2023 Oral)☆69Jul 8, 2024Updated last year
- Code of the paper "Low-Latency Speech Separation Guided Diarization for Telephone Conversations"☆14Dec 22, 2022Updated 3 years ago
- Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition☆18Jul 16, 2024Updated last year
- ☆17May 5, 2024Updated last year
- The official implementation of the method discussed in the paper Improving Spoken Language Identification with Map-Mix(work accepted at I…☆18Feb 17, 2023Updated 2 years ago
- Code for the submitted 2021 DCASE Workshop paper: "Waveforms and Spectrograms: Enhancing Acoustic Scene Classification Using Multimodal F…☆16Aug 9, 2021Updated 4 years ago
- WildDESED: A LLM-Powered Dataset for Wild Domestic Environment Sound Event Detection☆17Nov 19, 2024Updated last year
- [ACM MM 2023] Official PyTorch implementation of "Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Reco…☆12Aug 4, 2023Updated 2 years ago
- Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation (ACM MM 2024)☆20Mar 17, 2025Updated 10 months ago
- Official Implementation of GLAP - General Language Audio Pretraining☆61Jan 5, 2026Updated last month
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆40Jan 27, 2025Updated last year
- Prompting Large Language Models with Audio for General-Purpose Speech Summarization☆19May 14, 2025Updated 8 months ago
- ☆56Jul 13, 2025Updated 7 months ago
- Audio captioning recipe☆51Oct 23, 2025Updated 3 months ago
- [CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie…☆23Jun 6, 2025Updated 8 months ago
- Training, validation, and inference code for various SSL approaches and architectures.☆77Oct 22, 2025Updated 3 months ago
- Official code release for "RTFS-Net: Recurrent time-frequency modelling for efficient audio-visual speech separation", accepted ICLR 2024☆49Oct 14, 2025Updated 3 months ago
- Acoustic echo cancelation(AEC) is a main algorithm in the pipe line of acoustic devices with KWS or ASR. FNLMS is used.☆19Apr 22, 2019Updated 6 years ago
- SA-toolkit: Speaker speech anonymization toolkit in python☆30Sep 18, 2025Updated 4 months ago
- This repository contains a short introduction on the topic of audio and speech processing -- from basics to applications.☆21Dec 20, 2023Updated 2 years ago
- An End-to-End Pipeline for Enhanced French Text-to-Speech with SSML Prosody Control☆30Jan 13, 2026Updated last month
- [INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark☆308Mar 31, 2025Updated 10 months ago
- Pytorch implementation for Egoinstructor at CVPR 2024☆28Dec 1, 2024Updated last year
- Pushing the Limits of Zero-shot End-to-End Speech Translation☆26Dec 12, 2024Updated last year
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆26Aug 5, 2024Updated last year
- Supervoice diffusion enhance☆28Jul 15, 2024Updated last year
- Official Implementation of EnCLAP (ICASSP 2024)☆94Jun 2, 2024Updated last year