LeiLiLab/InfiniSST

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LeiLiLab/InfiniSST)

LeiLiLab / InfiniSST

☆25

Alternatives and similar repositories for InfiniSST

Users that are interested in InfiniSST are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

fyvo / WMT-Biomed-Test
View on GitHub
☆13Aug 23, 2024Updated last year
hlt-mt / simulstream
View on GitHub
simulstream is a Python library for simultaneous/streaming speech recognition and translation. It enables both the simulation with existi…
☆29Jul 9, 2026Updated last week
ffaltings / InteractiveTextGeneration
View on GitHub
☆34Mar 25, 2023Updated 3 years ago
speechcatcher-asr / speechcatcher-data
View on GitHub
☆11Sep 5, 2025Updated 10 months ago
anthony-wss / glm-4-voice-finetune
View on GitHub
☆14Apr 4, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
yxduir / m2m-70
View on GitHub
☆18Jun 25, 2026Updated 3 weeks ago
dreamtheater123 / VoxEval
View on GitHub
Github repository for ACL 2025 paper: VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models
☆24Jun 16, 2025Updated last year
aaronng91 / semantic-turn-detection
View on GitHub
Script to demonstrate how to use a Language Model for Semantic Turn Detection. Refer to blog post for full details.
☆18May 9, 2025Updated last year
YangXusheng-yxs / CodecFormer_5Hz
View on GitHub
☆35Oct 23, 2025Updated 8 months ago
ictnlp / DiSeg
View on GitHub
Source code for ACL 2023 paper "End-to-End Simultaneous Speech Translation with Differentiable Segmentation"
☆37Dec 6, 2023Updated 2 years ago
yxduir / LLM-SRT
View on GitHub
☆28Mar 11, 2026Updated 4 months ago
tzyll / ChineseHP
View on GitHub
Dataset for Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models in Interspeech 2024.
☆16Jul 4, 2024Updated 2 years ago
llm-lab-org / CLASP
View on GitHub
CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval
☆13Jun 27, 2025Updated last year
duyichao / NPDA-KNN-ST
View on GitHub
Official implementation of EMNLP'2022 paper "Non-Parametric Domain Adaptation for End-to-End Speech Translation"
☆11Oct 26, 2022Updated 3 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
xuchennlp / S2T
View on GitHub
The project for speech translation
☆12Sep 28, 2023Updated 2 years ago
fgnt / speaker_reassignment
View on GitHub
Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment
☆14Feb 5, 2025Updated last year
MontrealCorpusTools / kalpy
View on GitHub
Pybind11 bindings for Kaldi
☆15Jul 11, 2026Updated last week
Xianchao-Wu / wenet-deep-sparse-conformer
View on GitHub
☆15Aug 25, 2022Updated 3 years ago
Mrunal-G / Casual-turn-taking-and-backchannel-prediction
View on GitHub
☆16Jun 25, 2024Updated 2 years ago
ByteDance-Seed / Seed-X-7B
View on GitHub
☆170Aug 18, 2025Updated 11 months ago
AmphionTeam / FlexiCodec
View on GitHub
[ICLR2026] FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
☆50Jul 1, 2026Updated 2 weeks ago
jimbozhang / xares
View on GitHub
A benchmark for evaluating audio encoders on various audio tasks.
☆55Apr 27, 2026Updated 2 months ago
BUTSpeechFIT / SOT-DiCoW
View on GitHub
Multi-talker ASR based on DiCoW with Serialized Output Training
☆20Sep 18, 2025Updated 10 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
leohuang2013 / pyannote-audio_overlapped-speech-detection_cpp
View on GitHub
C++ version of pyannote audio overlapped speech detection pipeline
☆13Feb 14, 2024Updated 2 years ago
thu-spmi / CTC-TTS
View on GitHub
Code for CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment, Interspeech 2026.
☆20Jun 9, 2026Updated last month
vivian556123 / NeurIPS2024-CoVoMix
View on GitHub
Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
☆67Jan 16, 2025Updated last year
walker-hyf / GPT-Talker
View on GitHub
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
☆78Nov 1, 2024Updated last year
Auroraaa86 / LCS-CTC
View on GitHub
For IEEE ASRU(2025)
☆15Jun 21, 2025Updated last year
llm-jp / llama-mimi
View on GitHub
Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…
☆31Sep 20, 2025Updated 10 months ago
frankenliu / LOAE
View on GitHub
☆10Sep 25, 2024Updated last year
TSAI-CHANCHANG / CS_knowledge_point_note
View on GitHub
这是一个大学四年的cs基础课部分专业课的复习笔记的扫描版备份仓库
☆12Jun 29, 2019Updated 7 years ago
avishaiElmakies / unsupervised_speech_segmentation_using_slm
View on GitHub
☆20Jan 8, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Mddct / cosyvoice2-flow-optimized
View on GitHub
faster inference
☆27Jan 20, 2025Updated last year
MCG-NJU / Video-DC
View on GitHub
☆12Jul 30, 2025Updated 11 months ago
SpeechColab / GigaSpeech2
View on GitHub
An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
☆197Apr 28, 2026Updated 2 months ago
Sreyan88 / LipGER
View on GitHub
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
☆19Jul 16, 2024Updated 2 years ago
Liu-Tianchi / Golden-Gemini-for-Speaker-Verification
View on GitHub
Official release of pretrained models and codes for 'Golden Gemini Is All You Need: Finding the Sweet Spots for Speaker Verification'
☆15Jan 20, 2025Updated last year
sarapapi / hearing2translate
View on GitHub
A unified evaluation suite for speech-to-text translation, covering SpeechLLMs, SFMs, and cascaded systems across diverse real-world spee…
☆32Apr 25, 2026Updated 2 months ago
vode / onlinePtrNet_disentanglement
View on GitHub
☆13May 23, 2021Updated 5 years ago