jasonppy / PromptingWhisper
Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
☆132Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for PromptingWhisper
- AudioBench: A Universal Benchmark for Audio Large Language Models☆89Updated last month
- ☆51Updated last week
- Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning (ASRU2023)☆26Updated last year
- CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus☆182Updated 2 years ago
- Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context☆179Updated last month
- EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction☆230Updated 5 months ago
- Multilingual G2P in 100 languages☆285Updated last year
- Official code for Wav2Seq☆95Updated 2 years ago
- Official repository of DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech, ICASSP 2023☆202Updated last year
- Various speech datasets made available to the public☆98Updated last month
- PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.☆189Updated last month
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆136Updated last year
- Audio Large Language Models☆126Updated 2 weeks ago
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆77Updated 3 months ago
- Implementation of SoundStorm built upon SpeechTokenizer.☆103Updated last year
- [INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark☆140Updated 4 months ago
- Speaker change detection using SincNet and an LSTM/Transformer☆44Updated 4 months ago
- **Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speec…☆97Updated last year
- This Repository surveys the paper focusing on Prompting and Adapters for Speech Processing.☆103Updated last year
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆70Updated last year
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆83Updated last month
- Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models☆140Updated 10 months ago
- ☆64Updated last year
- 《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm☆81Updated last year
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆50Updated 2 months ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆72Updated 5 months ago
- A toolkit for Spoken Language Understanding Evaluation (SLUE) benchmark. Refer paper https://arxiv.org/abs/2111.10367 for more details. O…☆62Updated 8 months ago
- ☆302Updated 2 months ago
- Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.☆93Updated last year