ALM-LAB / PACE
PACE (Podcast AI for Chapters and Episodes) is a semantic search engine that helps you find the information you need, inter- and intra-podcasts (Project for the AssemblyAI Winter 2022 Hackathon).
☆15Updated 2 years ago
Alternatives and similar repositories for PACE:
Users that are interested in PACE are comparing it to the libraries listed below
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation☆143Updated last year
- ☆62Updated 8 months ago
- Repository contains code to fine-tune WhisperASR model☆23Updated 2 years ago
- Joint speech-language model - respond directly to audio!☆30Updated 10 months ago
- ITALIC: An ITALian Intent Classification Dataset☆12Updated last year
- ☆352Updated last year
- Go from raw audio files to a text-audio dataset automatically with OpenAI's Whisper.☆135Updated last year
- A simple, consistent and extendable toolkit for IndicTrans2☆24Updated 3 weeks ago
- This repository contains the code for the paper "voc2vec: A Foundation Model for Non-Verbal Vocalization", accepted at ICASSP 2025.☆13Updated last month
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆81Updated last year
- ☆280Updated 9 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆61Updated 3 weeks ago
- ☆84Updated last week
- ☆156Updated last year
- ☆18Updated 2 years ago
- Tokun to can tokens☆16Updated this week
- Towards Building Text-To-Speech Systems for the Next Billion Users - Microsoft Research Intern Work - Accepted at ICASSP 2023☆51Updated last year
- Speaker diarization service☆21Updated last month
- Open TTS models, built for streaming on the edge☆39Updated 2 weeks ago
- ☆104Updated this week
- Various speech datasets made available to the public☆115Updated 3 months ago
- This public GitHub repository contains code for a fully self-hosted, on-premise transcription solution.☆53Updated 3 months ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆56Updated 8 months ago
- [ICASSP 2025] Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners".☆14Updated 3 weeks ago
- Speaker Diarization with Transformers☆64Updated 10 months ago
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆145Updated last month
- ☆11Updated 2 years ago
- Repository for the LLM course☆14Updated 3 months ago
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆119Updated 3 months ago
- VoiceBench: Benchmarking LLM-Based Voice Assistants☆159Updated last week