Eureka-Audio: A 1.7B lightweight audio–language model that matches 7B–30B models on ASR, audio understanding, and paralinguistic reasoning.
☆25Updated this week
Alternatives and similar repositories for Eureka-Audio
Users that are interested in Eureka-Audio are comparing it to the libraries listed below
Sorting:
- Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning☆16Jun 23, 2024Updated last year
- Acoustic echo cancelation(AEC) is a main algorithm in the pipe line of acoustic devices with KWS or ASR. FNLMS is used.☆19Apr 22, 2019Updated 6 years ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 8 months ago
- semantic tokenizer for speech and music☆21Jul 6, 2025Updated 7 months ago
- Onset-and-Offset-Aware Sound Event Detection☆21Feb 10, 2025Updated last year
- A chinese singing voice dataset, professional male singer, 105 songs, 132 minutes☆11Oct 19, 2023Updated 2 years ago
- A simple command line tool to calculate WER for ASR.☆14Oct 14, 2024Updated last year
- Code for the submitted 2021 DCASE Workshop paper: "Waveforms and Spectrograms: Enhancing Acoustic Scene Classification Using Multimodal F…☆16Aug 9, 2021Updated 4 years ago
- Code of the paper "Low-Latency Speech Separation Guided Diarization for Telephone Conversations"☆15Dec 22, 2022Updated 3 years ago
- Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation (ACM MM 2024)☆20Mar 17, 2025Updated 11 months ago
- Official Implementation of GLAP - General Language Audio Pretraining☆61Jan 5, 2026Updated last month
- Towards a general language-audio model for computational paralinguistic tasks☆24Dec 14, 2024Updated last year
- Variable Bitrate Residual Vector Quantization for Audio Coding☆51May 1, 2025Updated 9 months ago
- Official code release for "RTFS-Net: Recurrent time-frequency modelling for efficient audio-visual speech separation", accepted ICLR 2024☆49Oct 14, 2025Updated 4 months ago
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆87Dec 20, 2024Updated last year
- An End-to-End Pipeline for Enhanced French Text-to-Speech with SSML Prosody Control☆30Jan 13, 2026Updated last month
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆26Aug 5, 2024Updated last year
- Supervoice diffusion enhance☆28Jul 15, 2024Updated last year
- Lightweight Speech Representation Learning for One-Shot Voice Conversion☆24Dec 12, 2024Updated last year
- Pushing the Limits of Zero-shot End-to-End Speech Translation☆26Dec 12, 2024Updated last year
- System that ranks 2nd in DCASE 2022 Challenge Task 5: Few-shot Bioacoustic Event Detection☆28Jul 6, 2022Updated 3 years ago
- ☆68Dec 30, 2025Updated 2 months ago
- ☆30Updated this week
- CML-TTS: A Multilingual Dataset for Speech Synthesis☆33Jul 31, 2024Updated last year
- The implementation for "Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System".☆30Aug 2, 2025Updated 6 months ago
- Code and pretrained models for "DUB: Discrete Unit Back-translation for Speech Translation" (ACL 2023 Findings)☆28Jun 28, 2023Updated 2 years ago
- Official implementation of the paper titled "Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Mu…☆27Mar 5, 2024Updated last year
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers☆125Mar 20, 2025Updated 11 months ago
- Spherical residual vector quantization (SRVQ)☆31Aug 25, 2024Updated last year
- ☆32Mar 11, 2022Updated 3 years ago
- Official implementation of paper: Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-…☆40Sep 18, 2024Updated last year
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆78Nov 1, 2024Updated last year
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models☆22Feb 13, 2026Updated 2 weeks ago
- BAE-NET: A LOW COMPLEXITY AND HIGH FIDELITY BANDWIDTH-ADAPTIVE NEURAL NETWORK FOR SPEECH SUPER-RESOLUTION☆80Aug 20, 2024Updated last year
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- Python runtime for WeTextProcessing (does not depend on Pynini)☆48Nov 28, 2025Updated 3 months ago
- The implementation of paper "SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody"☆34Nov 23, 2023Updated 2 years ago
- ViSpeR: Multilingual Audio-Visual Speech Recognition☆55Apr 17, 2025Updated 10 months ago
- it's ASR decoder and make graph project☆33May 26, 2022Updated 3 years ago