dkurzend / ClipClap-GZSLView external linksLinks
Audio-Visual Generalized Zero-Shot Learning using Large Pre-Trained Models
☆22Apr 15, 2024Updated last year
Alternatives and similar repositories for ClipClap-GZSL
Users that are interested in ClipClap-GZSL are comparing it to the libraries listed below
Sorting:
- Rainbow Keywords - Official PyTorch Implementation☆13Jun 27, 2024Updated last year
- This repo contains conv-tasnet for basis-melgan. If you want to get code of basis-melgan, please refer to FastVocoder.☆21Jul 21, 2021Updated 4 years ago
- This repository contains the code for our CVPR 2022 paper on "Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and …☆42Nov 29, 2022Updated 3 years ago
- ☆22Mar 20, 2024Updated last year
- FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis (Accepted by ISCSLP'2024)☆26Feb 22, 2024Updated last year
- [CVPR 2023] Official implementation of our paper - Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learnin…☆27Apr 10, 2023Updated 2 years ago
- [AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer☆73Mar 6, 2025Updated 11 months ago
- Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"☆37Oct 11, 2024Updated last year
- Official code for the paper: [ICCV2023] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation☆41Dec 23, 2023Updated 2 years ago
- Official PyTorch implementation of SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy M…☆37Aug 27, 2024Updated last year
- ☆38Apr 15, 2024Updated last year
- List of direct speech-to-speech translation papers.☆38Jan 31, 2023Updated 3 years ago
- ☆20Aug 8, 2025Updated 6 months ago
- WaveNet auto-ancoders for ZeroSpeech challenge 2020☆37Apr 7, 2022Updated 3 years ago
- Q-HEART: ECG Question Answering via Knowledge-Informed Multimodal LLMs (ECAI 2025)☆14Jan 23, 2026Updated 3 weeks ago
- ☆40Apr 14, 2025Updated 10 months ago
- Python phase-vocoder implementation with pitch shifting and formant correction☆14Feb 17, 2022Updated 3 years ago
- Installable package for rvc voice inferencing☆11Aug 11, 2024Updated last year
- A codebase for data crawling and preprocessing for TTS and ASR systems training.☆22Feb 5, 2026Updated last week
- ☆13May 21, 2024Updated last year
- Details of the datasets for Few-shot class-incremental audio classification☆11Dec 6, 2023Updated 2 years ago
- Code for Findings of ACL 2023 paper "Improving Zero-shot Multilingual Neural Machine Translation by Leveraging Cross-lingual Consistency …☆10Jul 18, 2023Updated 2 years ago
- MelGAN and Tacotron 2 in PyTorch☆11Oct 22, 2019Updated 6 years ago
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆12Feb 27, 2024Updated last year
- text to speech☆10Mar 19, 2024Updated last year
- Phonemes and durations labeling based on whisper small☆11Jul 7, 2024Updated last year
- Chameleon: A MatMul-Free TCN Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data☆25Jun 6, 2025Updated 8 months ago
- ☆10Sep 17, 2021Updated 4 years ago
- Official Code Repository for the paper "Generating Realistic Images from In-the-wild Sounds", ICCV 2023☆12Aug 24, 2025Updated 5 months ago
- machine translation data process tools☆10Apr 29, 2024Updated last year
- Code for ACL 2023 main conference paper "Back Translation for Speech-to-text Translation Without Transcripts".☆12Oct 25, 2023Updated 2 years ago
- A python algorithm to detect foot contact and foot clearance using kinematic or inertial data during forward or backward walking☆11Aug 3, 2021Updated 4 years ago
- Official code release for "TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion", accepted ICIST 2023☆12Mar 17, 2024Updated last year
- ☆11Apr 12, 2024Updated last year
- An implement of ORB-SLAM3 with python.☆10Jul 2, 2023Updated 2 years ago
- Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge☆21Jul 25, 2022Updated 3 years ago
- ☆11Dec 8, 2022Updated 3 years ago
- Implementation for NATv2.☆23Feb 20, 2021Updated 4 years ago
- ☆14Mar 21, 2025Updated 10 months ago