Caption, translate, and optionally record in real time "what you hear" from speakers and microphone. Never miss part of the conversation again.
β23Sep 11, 2025Updated 5 months ago
Alternatives and similar repositories for caption_anything
Users that are interested in caption_anything are comparing it to the libraries listed below
Sorting:
- finetune the chain model based on cvte open source model without traing any GMM for frame alignmentβ13Aug 6, 2020Updated 5 years ago
- πΌ Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decompositionβ14Nov 15, 2025Updated 3 months ago
- Repository for Accent Recognition (Hackathon @SLT2022)β38May 12, 2024Updated last year
- β13Jul 22, 2021Updated 4 years ago
- Official repository of the work "Low-complexity Unsupervised Audio Anomaly Detection exploiting Separable Convolutions and Angular Loss" β¦β10Nov 6, 2024Updated last year
- SChunk-Encoder (Transformer or Conformer) for streaming E2E ASRβ11Oct 21, 2022Updated 3 years ago
- Russian phonetical transcriptionβ11Nov 19, 2025Updated 3 months ago
- Code for the paper "RIR-in-a-Box : Estimating Room Acoustics from 3D Mesh Data through Shoebox Approximation" presented at Interspeech 20β¦β16Sep 1, 2024Updated last year
- β13Oct 9, 2025Updated 4 months ago
- A python script COMMAND LINE utility to AUTO GENERATE SUBTITLE FILE (using free Vosk Speech Recognition API) and TRANSLATED SUBTITLE FILEβ¦β11May 5, 2024Updated last year
- Make Your Couch a Data Pre-Processing Centβ¦β15Nov 1, 2023Updated 2 years ago
- Grapheme-to-phoneme tool for corpus conversion, where phonemes match Phoible inventoriesβ19Apr 10, 2025Updated 10 months ago
- Learning an Interpretable End-to-End Network for Real-Time Acoustic Beamformingβ15Aug 20, 2024Updated last year
- eCMU: An Efficient Phase-aware Framework for Music Source Separation with Conformer (IEEE RIVF23)β10Oct 30, 2024Updated last year
- A tool to collect/validate audio recordings from workers on Amazon Mechanical Turk. Written in Python/Flask. (originally hosted on githubβ¦β14Dec 19, 2022Updated 3 years ago
- MT8816 based 16x16 Analog Switch Matrixβ13Sep 23, 2023Updated 2 years ago
- β11Aug 11, 2023Updated 2 years ago
- Whisper finetuningβ16Apr 9, 2025Updated 10 months ago
- Heltec Cubecell: development platform for PlatformIOβ12Apr 8, 2024Updated last year
- KittenTTS is an ultra-lightweight, CPU-friendly text-to-speech model with 15M params for real-time, high-quality voices. Open source, fasβ¦β23Updated this week
- A corpus of diacritized Hebrew texts (ΧΧ§Χ‘Χ ΧΧ ΧΧ§Χ)β11May 4, 2022Updated 3 years ago
- π΅ muse: Music Separationβ11Feb 14, 2024Updated 2 years ago
- β11Sep 1, 2024Updated last year
- This repository created for the NHN ASR hackathon competition.β11Sep 20, 2023Updated 2 years ago
- Project for HIDING SPEAKERβS SEX IN SPEECH USING ZERO-EVIDENCE SPEAKER REPRESENTATION IN AN ANALYSIS/SYNTHESIS PIPELINEβ15Nov 30, 2022Updated 3 years ago
- offical code for Dense-TSNetβ12Sep 17, 2024Updated last year
- [ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoderβ12Mar 11, 2025Updated 11 months ago
- speex aec kalman filterβ15Mar 17, 2024Updated last year
- Docker for building an environment for Dutch online and offline ASR.β12Feb 2, 2021Updated 5 years ago
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech β¦β28Nov 7, 2025Updated 4 months ago
- Multilingual acoustic word embedding approaches applied and evaluated on GlobalPhone data.β11Nov 3, 2020Updated 5 years ago
- Repository for "Training Audio Captioning Models without Audio"β10Sep 26, 2023Updated 2 years ago
- [INTERSPEECH 2024] Official code for VoxSim: A perceptual voice similarity datasetβ12Sep 29, 2025Updated 5 months ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrievalβ13Jun 27, 2025Updated 8 months ago
- β13Oct 3, 2025Updated 5 months ago
- β13Oct 25, 2024Updated last year
- β13Apr 14, 2024Updated last year
- Using YouTube to prepare a speech recognition dataset for any languageβ10Mar 30, 2021Updated 4 years ago
- β10Oct 16, 2025Updated 4 months ago