Mashiro009/slidespeech_dl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Mashiro009/slidespeech_dl)

Mashiro009 / slidespeech_dl

☆24

Alternatives and similar repositories for slidespeech_dl

Users that are interested in slidespeech_dl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

facebookresearch / fbai-speech
View on GitHub
Repo for the FB AI Speech team.
☆27Aug 24, 2021Updated 4 years ago
huangruizhe / ConEC
View on GitHub
☆14Jun 17, 2024Updated 2 years ago
lifeiteng / Aligner-SUPERB
View on GitHub
Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark
☆39May 7, 2025Updated last year
MaikeZuefle / f-actor
View on GitHub
☆28Jul 17, 2026Updated last week
wonjune-kang / expressive-speech-retrieval
View on GitHub
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
☆15Aug 18, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
DianboWork / M3T-CNERTA
View on GitHub
☆11Aug 10, 2022Updated 3 years ago
Aratako / CALM-DACVAE
View on GitHub
An attempt to reproduce CALM (Continuous Audio Language Models) using DACVAE as the audio VAE.
☆18Feb 20, 2026Updated 5 months ago
s920128 / NAR-BERT-ASR
View on GitHub
NAR-BERT-ASR
☆10Sep 27, 2021Updated 4 years ago
X-LANCE / MSDWILD
View on GitHub
[INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.
☆66Jan 24, 2024Updated 2 years ago
jymh / SAP2-ASR
View on GitHub
☆26Jan 23, 2026Updated 6 months ago
huangruizhe / audio
View on GitHub
Data manipulation and transformation for audio signal processing, powered by PyTorch
☆10Sep 30, 2024Updated last year
FreedomIntelligence / S2S-Arena
View on GitHub
☆21Jun 4, 2026Updated last month
Splend1d / T5lephone
View on GitHub
Code for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
☆19Nov 29, 2022Updated 3 years ago
zds-potato / multilingual-phonetic-sv
View on GitHub
☆10Dec 22, 2023Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
R1ckShi / SeACo-Paraformer
View on GitHub
[ICASSP2023] Source code, model links and open test sets for paper SeACo-Paraformer.
☆44Mar 15, 2024Updated 2 years ago
kaistmm / VoxMM
View on GitHub
☆23May 11, 2026Updated 2 months ago
emonosuke / emoASR
View on GitHub
End-to-end MOdeling of ASR (Automatic Speech Recognition)
☆33Feb 16, 2023Updated 3 years ago
BUTSpeechFIT / SOT-DiCoW
View on GitHub
Multi-talker ASR based on DiCoW with Serialized Output Training
☆20Sep 18, 2025Updated 10 months ago
h-munakata / Lighthouse-Wrapper-for-Audio-Moment-Retrieval
View on GitHub
☆13Mar 23, 2026Updated 4 months ago
kjw11 / CSEnet-ASR
View on GitHub
Cross-Speaker Encoding Network for Multi-talker Speech Recognition
☆12Mar 14, 2025Updated last year
Sreyan88 / RECAP
View on GitHub
Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning
☆16Jun 23, 2024Updated 2 years ago
jdh-algo / JoyTTS
View on GitHub
☆41Jul 15, 2025Updated last year
xiaomi-research / tts-prism
View on GitHub
☆47Apr 27, 2026Updated 2 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
MingLunHan / CIF-ColDec
View on GitHub
[ICASSP 2022] Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection
☆25Jul 14, 2026Updated last week
thuhcsi / NeuFA
View on GitHub
Neural network-based forced alignment with bidirectional attention mechanism
☆78Jan 17, 2025Updated last year
the-anonymous-bs / av-SALMONN
View on GitHub
av-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
☆13May 8, 2024Updated 2 years ago
ahaliassos / usr2
View on GitHub
PyTorch implementation of USR 2.0 (ICLR 2026)
☆15Apr 3, 2026Updated 3 months ago
WingSingFung / TISDiSS
View on GitHub
Official implementation of TISDiSS, a scalable framework for discriminative source separation.
☆16Oct 19, 2025Updated 9 months ago
AudenAI / Auden
View on GitHub
☆71Apr 2, 2026Updated 3 months ago
hbwu-ntu / EmoCtrlTTS-Eval
View on GitHub
☆19Aug 23, 2024Updated last year
Tencent / StableToken
View on GitHub
[ICLR 2026] StableToken: A state-of-the-art noise-robust semantic speech tokenizer featuring Voting-LFQ for resilient SpeechLLMs.
☆33Feb 27, 2026Updated 4 months ago
aixplain / NoRefER
View on GitHub
☆18Jun 5, 2026Updated last month
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
ajd12342 / paraspeechclap
View on GitHub
Codebase for 'ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining'
☆23Jun 20, 2026Updated last month
yhytoto12 / Behavior-SD
View on GitHub
Official Implementation of NAACL 2025 Paper: Behavior-SD: Behaviorally Aware Spoken Dialogue Generation with Large Language Models
☆18Apr 30, 2025Updated last year
skit-ai / N-Best-ASR-Transformer
View on GitHub
Code for ACL-IJCNLP 2021 paper "N-Best-ASR-Transformer: Enhancing SLU Performance using Multiple ASR Hypotheses."
☆17Nov 30, 2021Updated 4 years ago
ahaliassos / raven
View on GitHub
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
☆82Feb 27, 2025Updated last year
zsLin177 / CopyNE
View on GitHub
☆20Jun 3, 2024Updated 2 years ago
isjwdu / DFADD
View on GitHub
Official Implementation and Dataset of paper - DFADD: The Diffusion and Flow-matching based Audio Deepfake Dataset
☆16Apr 7, 2025Updated last year
X-LANCE / SLAM-LLM
View on GitHub
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
☆1,049Jan 15, 2026Updated 6 months ago