xiaomi-research/mecat

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xiaomi-research/mecat)

xiaomi-research / mecat

☆42

Alternatives and similar repositories for mecat

Users that are interested in mecat are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

pkufool / simple-wer
View on GitHub
A simple command line tool to calculate WER for ASR.
☆14Oct 14, 2024Updated last year
LAION-AI / emotional-speech-annotations
View on GitHub
This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models
☆35Oct 13, 2024Updated last year
xiaomi-research / dasheng-glap
View on GitHub
Official Implementation of GLAP - General Language Audio Pretraining
☆72May 14, 2026Updated last month
xiaomi-research / dasheng-denoiser
View on GitHub
Official PyTorch inference code for the Interspeech 2025 paper: Efficient Speech Enhancement via Embeddings from Pre-trained Generative A…
☆81Jun 16, 2025Updated last year
X-LANCE / public_talks
View on GitHub
Materials of public talks given By SJTU X-LANCE members
☆14Dec 3, 2022Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
pengzhendong / speaker-diarization
View on GitHub
Offline Speaker Diarization with SenseVoice by Sherpa ONNX.
☆15Dec 23, 2024Updated last year
pengzhendong / wetext
View on GitHub
Python runtime for WeTextProcessing (does not depend on Pynini)
☆52Jun 11, 2026Updated last month
RicherMans / CED
View on GitHub
Source code for Consistent ensemble distillation for audio tagging
☆75Mar 20, 2026Updated 3 months ago
the-bird-F / GLM-Voice-RAG
View on GitHub
[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…
☆31Jul 11, 2025Updated last year
Hunterhuan / sphereface2_speaker_verification
View on GitHub
Exploring Binary Classification Loss for Speaker Verification
☆18Jul 18, 2023Updated 2 years ago
pzelasko / kaldialign
View on GitHub
Python wrappers for Kaldi Levenshtein's distance and alignment code.
☆69Jun 15, 2026Updated 3 weeks ago
SonyResearch / VRVQ
View on GitHub
Variable Bitrate Residual Vector Quantization for Audio Coding
☆53May 1, 2025Updated last year
zeyuxie29 / AudioTime
View on GitHub
☆39Jul 4, 2024Updated 2 years ago
jimbozhang / xares-llm-template
View on GitHub
Template for creating audio encoders compatible with X-ARES
☆19Feb 11, 2026Updated 5 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
ASLP-lab / Easy-Turn
View on GitHub
Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems
☆118Jan 25, 2026Updated 5 months ago
NiniAndy / Paraformer-V2
View on GitHub
来自于文章Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition
☆29Nov 20, 2024Updated last year
sevengivings / subtitle-extractor
View on GitHub
A Python script for AI speech recognition of video or audio file using whisper, stable-ts or faster-whisper and translation of subtitle u…
☆10Feb 17, 2025Updated last year
RicherMans / Dasheng
View on GitHub
Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"
☆86Nov 7, 2025Updated 8 months ago
Sakshi113 / MMAU
View on GitHub
☆154Feb 9, 2026Updated 5 months ago
XiaoMi / dasheng
View on GitHub
Official PyTorch code for Deep Audio-Signal Holistic Embeddings
☆198Nov 7, 2025Updated 8 months ago
thuhcsi / SpeechCraft
View on GitHub
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
☆193Feb 28, 2026Updated 4 months ago
xiaomi-research / xares-llm
View on GitHub
XARES-LLM
☆54Mar 26, 2026Updated 3 months ago
interactiveaudiolab / emphases
View on GitHub
Crowdsourced and Automatic Speech Prominence Estimation
☆27Apr 12, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wxqwinner / silero-vad-ncnn
View on GitHub
Silero VAD(ncnn): pre-trained enterprise-grade Voice Activity Detector.
☆26Aug 21, 2024Updated last year
pengzhendong / audio-pipeline
View on GitHub
☆23Oct 17, 2024Updated last year
dengcunqin / noise-reduction
View on GitHub
noise reduction
☆17Jul 3, 2024Updated 2 years ago
MorenoLaQuatra / vad
View on GitHub
Simple voice activity detection (VAD) algorithm in Python
☆15Aug 10, 2023Updated 2 years ago
QingyuLiu0521 / ICSD
View on GitHub
ICSD Dataset
☆42Jun 11, 2025Updated last year
jimbozhang / xares
View on GitHub
A benchmark for evaluating audio encoders on various audio tasks.
☆55Apr 27, 2026Updated 2 months ago
KeiKinn / ParaCLAP
View on GitHub
Towards a general language-audio model for computational paralinguistic tasks
☆30Dec 14, 2024Updated last year
yfyeung / CLSP
View on GitHub
[ACL 2026 Main] Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
☆104Apr 6, 2026Updated 3 months ago
X-LANCE / KWStreamingSearch
View on GitHub
☆92Jun 25, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
0nutation / USLM
View on GitHub
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
☆152Sep 14, 2023Updated 2 years ago
R1ckShi / SeACo-Paraformer
View on GitHub
[ICASSP2023] Source code, model links and open test sets for paper SeACo-Paraformer.
☆44Mar 15, 2024Updated 2 years ago
yichen14 / FastAdaSP
View on GitHub
Code for the paper "FastAdaSP: An Efficient Multitask Inference Framework for Large Speech Language Models". @ EMNLP'24(Oral)
☆17Nov 14, 2024Updated last year
frankenliu / LOAE
View on GitHub
☆10Sep 25, 2024Updated last year
IvanWang0730 / StyleAP
View on GitHub
Code and Data for Paper "Controlling Styles in Neural Machine Translation with Activation Prompt" (ACL 2023 Findings)
☆16Dec 20, 2022Updated 3 years ago
raymondxyy / pyaudlib
View on GitHub
A speech signal processing library in Python with emphasis on deep learning.
☆31Apr 13, 2026Updated 2 months ago
williamzhangsjtu / MulSupCon
View on GitHub
[AAAI 2024] Multi-Label Supervised Contrastive Learning (MulSupCon)
☆26May 6, 2024Updated 2 years ago