ALM-LAB / PACELinks
PACE (Podcast AI for Chapters and Episodes) is a semantic search engine that helps you find the information you need, inter- and intra-podcasts (Project for the AssemblyAI Winter 2022 Hackathon).
☆15Updated 2 years ago
Alternatives and similar repositories for PACE
Users that are interested in PACE are comparing it to the libraries listed below
Sorting:
- Joint speech-language model - respond directly to audio!☆30Updated last year
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation☆144Updated last year
- A simple, consistent and extendable toolkit for IndicTrans2☆32Updated last month
- Speaker Diarization with Transformers☆64Updated last year
- Official Repo for the Paper "AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution o…☆14Updated 4 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆62Updated last week
- ☆62Updated 10 months ago
- ☆14Updated 2 years ago
- AudioBench: A Universal Benchmark for Audio Large Language Models☆218Updated 2 weeks ago
- ITALIC: An ITALian Intent Classification Dataset☆13Updated last year
- TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog☆53Updated last year
- OpenAI Whisper Prompt Examples☆52Updated last year
- ☆18Updated last year
- Speaker diarization model☆27Updated 2 years ago
- ☆103Updated last week
- [ICASSP 2025] Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners".☆21Updated 2 months ago
- Generate visual podcasts about novels using open source models☆25Updated 2 years ago
- Datasets for turn-taking research☆13Updated last year
- Data and code for the paper "NormBank: A Knowledge Bank of Situational Social Norms"☆27Updated last year
- Speaker diarization service☆23Updated last month
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated 3 months ago
- Ichigo Whisper is a compact (22M parameters), open-source speech tokenizer for the Whisper-medium, designed to enhance performance on mul…☆15Updated 4 months ago
- Repository contains code to fine-tune WhisperASR model☆23Updated 2 years ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆40Updated 4 months ago
- This repository contains a short introduction on the topic of audio and speech processing -- from basics to applications.☆21Updated last year
- This public GitHub repository contains code for a fully self-hosted, on-premise transcription solution.☆52Updated 5 months ago
- The official repo of "WhiStress: Enriching Transcriptions with Sentence Stress Detection" (Interspeech 2025)☆20Updated last week
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.☆17Updated 6 months ago
- ☆294Updated 11 months ago
- Go from raw audio files to a text-audio dataset automatically with OpenAI's Whisper.☆137Updated last year