jasonppy / PromptingWhisperLinks
Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
☆144Updated last year
Alternatives and similar repositories for PromptingWhisper
Users that are interested in PromptingWhisper are comparing it to the libraries listed below
Sorting:
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆83Updated last year
- Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning (ASRU2023)☆27Updated last year
- EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction☆252Updated last year
- Various speech datasets made available to the public☆118Updated 5 months ago
- AudioBench: A Universal Benchmark for Audio Large Language Models☆215Updated last week
- CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus☆195Updated 2 years ago
- Speaker identification/verification models for Machine Learning for Computer Vision class at UNIBO☆63Updated 2 years ago
- Speaker change detection using SincNet and an LSTM/Transformer☆51Updated last week
- Official repository of DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech, ICASSP 2023☆218Updated 2 years ago
- ☆79Updated last year
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆127Updated 5 months ago
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation☆165Updated 3 weeks ago
- Official code for Wav2Seq☆96Updated 2 years ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆76Updated 11 months ago
- Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection☆62Updated 2 months ago
- A mini, simple, and fast end-to-end automatic speech recognition toolkit.☆51Updated 2 years ago
- ☆103Updated this week
- This Repository surveys the paper focusing on Prompting and Adapters for Speech Processing.☆108Updated last year
- Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context☆194Updated 8 months ago
- VoiceBench: Benchmarking LLM-Based Voice Assistants☆211Updated 3 weeks ago
- Clustering-based methods for overlapping diarization☆81Updated last year
- The implementation for "Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"☆40Updated last month
- ☆67Updated 8 months ago
- Reproducible experimental protocols for multimedia (audio, video, text) database☆100Updated 3 months ago
- A sequence-to-sequence voice conversion toolkit.☆98Updated 10 months ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆98Updated 7 months ago
- ☆291Updated 11 months ago
- asr2k☆50Updated 11 months ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆145Updated last year
- Predicts the level of noise and reverberation on your audiofiles☆150Updated last year