☆15Mar 12, 2024Updated last year
Alternatives and similar repositories for IS2024_stream_decoder_only_asr
Users that are interested in IS2024_stream_decoder_only_asr are comparing it to the libraries listed below
Sorting:
- Mandarin Chinese audio datasets aligned with Montreal Forced Aligner☆15Aug 13, 2024Updated last year
- Aligntune : A Modular Toolkit for Post Training Alignment of LLMs☆35Feb 26, 2026Updated last week
- Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMs☆13Feb 13, 2024Updated 2 years ago
- Prompting Large Language Models with Audio for General-Purpose Speech Summarization☆19May 14, 2025Updated 9 months ago
- Code:Completely Unsupervised Speech Recognition By A Generative Adversarial Network Harmonized With Iteratively Refined Hidden Markov Mod…☆25Dec 17, 2019Updated 6 years ago
- Alignment examples for Interspeech 2024☆27Jul 5, 2024Updated last year
- Collection of scripts from mHuBERT-147.☆32Nov 19, 2024Updated last year
- faster inference☆28Jan 20, 2025Updated last year
- Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems☆82Jan 25, 2026Updated last month
- One command to build TLG.fst for WeNet.☆30Oct 11, 2022Updated 3 years ago
- Implementation of Google's USM speech model in Pytorch☆35Feb 7, 2026Updated 3 weeks ago
- AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data☆35Dec 31, 2023Updated 2 years ago
- ☆151Apr 25, 2025Updated 10 months ago
- ☆80Aug 11, 2025Updated 6 months ago
- TASU: A New Style of Alignment of Speech LLM with only Text Training Data, zero-shot on ASR and Other SU tasks☆22Jan 19, 2026Updated last month
- MiniGPT-Pancreas: Multimodal Large language Model for Pancreas Cancer Classification and Detection☆11Sep 19, 2025Updated 5 months ago
- [ICLR 2026] DecAlign: Aligning Cross-Modal Semantics for Multimodal Foundation Models☆49Feb 5, 2026Updated last month
- Official Implementation of GLAP - General Language Audio Pretraining☆64Jan 5, 2026Updated 2 months ago
- Trainging, inference, and testing of the SAC speech codec model.☆99Nov 1, 2025Updated 4 months ago
- Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.☆62Sep 5, 2025Updated 6 months ago
- WavReward: Spoken Dialogue Models With Generalist Reward Evaluators☆54May 15, 2025Updated 9 months ago
- PyTorch Implementation of Stepwise Monotonic Multihead Attention similar to Enhancing Monotonicity for Robust Autoregressive Transformer …☆39May 16, 2021Updated 4 years ago
- A benchmark dataset designed to support the development and evaluation of large language models (LLMs) for conversational mental health a…☆17Feb 24, 2025Updated last year
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆45Feb 9, 2025Updated last year
- Code repository supporting the paper "Auto-Generating Weak Labels for Real & Synthetic Data to Improve Label-Scarce Medical Image Segment…☆11Apr 29, 2024Updated last year
- ☆14May 25, 2022Updated 3 years ago
- [IROS 2025] EgoLoc: Zero-Shot Temporal Interaction Localization for Egocentric Videos☆33Jan 13, 2026Updated last month
- ☆31Feb 26, 2026Updated last week
- A semi print-in-place hand for human-like manipulation, designed to be built by anyone.☆17Jan 5, 2026Updated 2 months ago
- Demo for DART, Audio Imagination workshop submission in NeurIPS 2024☆12Apr 15, 2025Updated 10 months ago
- Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"☆25Jul 21, 2025Updated 7 months ago
- ☆16Jul 20, 2025Updated 7 months ago
- Android test project displaying live camera feed in a GLSurfaceView☆10Mar 8, 2015Updated 10 years ago
- ☆36Mar 14, 2025Updated 11 months ago
- [ICASSP 2024] KNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels☆42Mar 20, 2024Updated last year
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆299May 16, 2025Updated 9 months ago
- offical code for Dense-TSNet☆12Sep 17, 2024Updated last year
- 基于python实现的桌面视频动态壁纸引擎☆10Jun 2, 2022Updated 3 years ago
- A Benchmark and Evaluation Suite for Zero-shot Singing Voice Synthesis☆23Feb 11, 2026Updated 3 weeks ago