earthspecies / ispaLinks
☆24Updated 6 months ago
Alternatives and similar repositories for ispa
Users that are interested in ispa are comparing it to the libraries listed below
Sorting:
- BEANS: The Benchmark of Animal Sounds☆104Updated 7 months ago
- AVES: Animal Vocalization Encoder based on Self-Supervision☆118Updated last month
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆97Updated 10 months ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆46Updated 7 months ago
- Coco-Nut (Corpus of connecting NIHONGO utterance and text) corpus☆22Updated 11 months ago
- [ISMIR 2023] LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT☆48Updated last year
- ☆24Updated 3 weeks ago
- small audio language model for reasoning☆64Updated last month
- Contrastive language-audio pretraining for bioacoustics☆19Updated last year
- ☆40Updated 9 months ago
- Unofficial download repository for MusicCaps☆47Updated 2 years ago
- ☆43Updated 11 months ago
- This repository contains the code for the paper "voc2vec: A Foundation Model for Non-Verbal Vocalization", accepted at ICASSP 2025.☆31Updated last month
- (ICASSP 2025) Learning Source Disentanglement in Neural Audio Codec☆33Updated 3 weeks ago
- Speech Human Evaluation Estimation Toolkit (SHEET)☆82Updated this week
- A standardized toolkit of Kernel Audio Distance (KAD)—a distribution-free, unbiased, and computationally efficient metric for evaluating …☆70Updated 2 months ago
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆137Updated 5 months ago
- Survey on speech generation work.☆19Updated last year
- JamendoMaxCaps is a large-scale dataset of 362,000 instrumental creative commons tracks☆36Updated 2 weeks ago
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…☆37Updated last year
- ☆25Updated 2 months ago
- SERAB: a multi-lingual benchmark for speech emotion recognition☆28Updated 2 years ago
- ☆63Updated last year
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …☆67Updated 5 months ago
- Pre-training, fine-tuning, and inference code with the MAEST models for music analysis applications.☆48Updated 3 months ago
- ARCH: Audio Representations benCHmark☆45Updated 9 months ago
- JEPAs for audio representation learning☆16Updated last year
- ☆83Updated 2 years ago
- ☆61Updated 7 months ago
- ☆27Updated last week