A repository used to organize content related to Large Speech(Audio) Model, including paper, data, applications, tools and so on.
☆28Nov 8, 2025Updated 4 months ago
Alternatives and similar repositories for Awesome-Large-Speech-Model
Users that are interested in Awesome-Large-Speech-Model are comparing it to the libraries listed below
Sorting:
- Repo for the FB AI Speech team.☆25Aug 24, 2021Updated 4 years ago
- Official implementation of the paper titled "Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Mu…☆27Mar 5, 2024Updated 2 years ago
- A list of conferences and journals relevant to machine translation☆33Mar 17, 2022Updated 3 years ago
- Speech Emotion Recognition using Deep Learning☆12May 24, 2021Updated 4 years ago
- [Lab] lab website☆11Feb 24, 2026Updated 2 weeks ago
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models☆27Feb 13, 2026Updated 3 weeks ago
- [ICASSP 2024] KNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels☆42Mar 20, 2024Updated last year
- [ICME 2021 Oral] Official implementation for "FGF-GAN: A Lightweight Generative Adversarial Network for Pansharpening via Fast Guided Fil…☆10Mar 29, 2022Updated 3 years ago
- Implementation of Qwen3-ASR-0.6B in GGML☆46Feb 10, 2026Updated last month
- Semantic Map Learning of Traffic Light to Lane Assignment based on Motion Data☆11Mar 30, 2024Updated last year
- 软件缺陷管理系统 - SpringBoot+Vue☆10Jan 6, 2021Updated 5 years ago
- Onset-and-Offset-Aware Sound Event Detection☆21Feb 10, 2025Updated last year
- open-source Mandarian biased word dataset☆14Sep 21, 2023Updated 2 years ago
- The project for speech translation☆12Sep 28, 2023Updated 2 years ago
- Automatically setup the AISHELL-4 and MSDWild dataset for usage with pyannote-database (and pyannote-audio)☆15Oct 22, 2025Updated 4 months ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 8 months ago
- NAR-BERT-ASR☆10Sep 27, 2021Updated 4 years ago
- Offline Speaker Diarization with SenseVoice by Sherpa ONNX.☆15Dec 23, 2024Updated last year
- Optimized Analysis of Semantic Segmentation of Remote Sensing Images Based on FCN☆13Nov 4, 2022Updated 3 years ago
- DOA estimation source code☆10May 13, 2019Updated 6 years ago
- Self-host application can generate illustration from a novel by highlighting certain sentences☆12Oct 12, 2025Updated 4 months ago
- ☆11Aug 10, 2022Updated 3 years ago
- ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation☆25Aug 24, 2025Updated 6 months ago
- Unofficial reimplementation of CFNet: Cascade Fusion Network for Dense Prediction☆10Mar 23, 2023Updated 2 years ago
- [ICTC'24] - "Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet and Hybrid CNNs-BiLSTM Architecture" by Nhut Mi…☆10Jan 16, 2025Updated last year
- S3PRL for Speech Emotion Recognition (see s3prl > downstream)☆15Feb 28, 2026Updated last week
- ☆14Jan 25, 2024Updated 2 years ago
- A simple command line tool to calculate WER for ASR.☆14Oct 14, 2024Updated last year
- a simple command line tool / package that prints the dependencies of a python project☆28Apr 6, 2018Updated 7 years ago
- ICASSP 2024: Robust DOA estimation from deep acoustic imaging☆22Apr 14, 2024Updated last year
- ☆16Feb 6, 2020Updated 6 years ago
- ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models (TTS)☆10Mar 9, 2024Updated 2 years ago
- 2021MathorCup高校数学建模挑战赛大数据竞赛B题-遥感地块分割-国家一等奖☆12May 1, 2021Updated 4 years ago
- Forced alignment decoder for Whisper.☆14Mar 13, 2024Updated last year
- homework of coursera nlp course. https://www.coursera.org/learn/language-processing/home/welcome☆15Dec 7, 2022Updated 3 years ago
- ☆15Sep 13, 2022Updated 3 years ago
- The official implement of Freeze-Omni.☆15Jul 10, 2025Updated 8 months ago
- 清华大学2019年计算机网络原理路由器实验☆14Jan 5, 2020Updated 6 years ago
- [EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…☆28Jul 11, 2025Updated 7 months ago