earthspecies / ispaLinks
☆25Updated 8 months ago
Alternatives and similar repositories for ispa
Users that are interested in ispa are comparing it to the libraries listed below
Sorting:
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆135Updated 7 months ago
- ☆103Updated 2 months ago
- Implementation of SoundStorm built upon SpeechTokenizer.☆112Updated last year
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆163Updated 7 months ago
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …☆78Updated 6 months ago
- Survey on speech generation work.☆20Updated last year
- Repository for fine-tuning BEATs and using BEATs as feature extractor in a prototypical network. This repository has been used to complet…☆34Updated 4 months ago
- Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings…☆95Updated 6 months ago
- Audiogen Codec☆140Updated last year
- [AAAI 2024] Code for CTX-vec2wav in UniCATS☆129Updated last year
- A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.☆55Updated 8 months ago
- ☆44Updated last year
- Expressive Anechoic Recordings of Speech (EARS)☆169Updated last year
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆95Updated 6 months ago
- [SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model☆122Updated 9 months ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆47Updated 9 months ago
- Versatile Evaluation of Speech and Audio☆300Updated 2 weeks ago
- BEANS: The Benchmark of Animal Sounds☆104Updated 9 months ago
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆68Updated 2 months ago
- ☆58Updated 2 years ago
- ☆63Updated last year
- Style-Controllable Zero-Shot Text to Speech Synthesizer based on VALL-E☆137Updated 8 months ago
- AVES: Animal Vocalization Encoder based on Self-Supervision☆122Updated 3 months ago
- Unofficial implementation of miipher☆129Updated last year
- SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)☆74Updated 3 weeks ago
- Audio Captioning datasets for PyTorch.☆121Updated last week
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆97Updated 11 months ago
- Baseline multi-resolution cross network model trained using the Divide and Remaster Dataset☆82Updated last year
- This is the M-AILABS Speech Dataset☆71Updated 7 months ago
- Contains the code associated with the ICLR submission for our text-to-speech diffusion model☆54Updated last year