ALM-LAB / PACE
PACE (Podcast AI for Chapters and Episodes) is a semantic search engine that helps you find the information you need, inter- and intra-podcasts (Project for the AssemblyAI Winter 2022 Hackathon).
☆15Updated 2 years ago
Alternatives and similar repositories for PACE:
Users that are interested in PACE are comparing it to the libraries listed below
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation☆143Updated last year
- ☆62Updated 9 months ago
- Joint speech-language model - respond directly to audio!☆30Updated 11 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆62Updated 2 weeks ago
- A simple, consistent and extendable toolkit for IndicTrans2☆25Updated last month
- ☆14Updated 2 years ago
- This public GitHub repository contains code for a fully self-hosted, on-premise transcription solution.☆53Updated 4 months ago
- Go from raw audio files to a text-audio dataset automatically with OpenAI's Whisper.☆135Updated last year
- Towards Building Text-To-Speech Systems for the Next Billion Users - Microsoft Research Intern Work - Accepted at ICASSP 2023☆53Updated last year
- Repository for "LLM-based speaker diarization correction: A generalizable approach" paper☆12Updated 8 months ago
- Speaker Diarization with Transformers☆64Updated 11 months ago
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆150Updated 2 months ago
- Repository contains code to fine-tune WhisperASR model☆23Updated 2 years ago
- ☆88Updated 2 weeks ago
- Collection of scripts from mHuBERT-147.☆24Updated 5 months ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated 2 months ago
- Open TTS models, built for streaming on the edge☆39Updated last month
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …☆81Updated 10 months ago
- Speech-MASSIVE is a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSI…☆21Updated 7 months ago
- [ICASSP 2025] Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners".☆17Updated last month
- Collection of Open Source Speech Data☆153Updated 5 months ago
- This repo contains the code for "Voice Disorder Analysis: A Transformer-based Approach", accepted at Interspeech 2024☆10Updated 10 months ago
- Audio tokenization, in the fastest way possible!☆51Updated 8 months ago
- Datasets for turn-taking research☆12Updated last year
- Speaker diarization model☆27Updated 2 years ago
- Implementation of Google's USM speech model in Pytorch☆31Updated 2 weeks ago
- Official Repo for the Paper "AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution o…☆12Updated 3 months ago
- Simple Diarization model☆47Updated last year
- Repository having the code and models from the paper: data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student traini…☆12Updated last year
- A python package for whisper normalizer☆55Updated last week