andimarafioti/nano-parakeet

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/andimarafioti/nano-parakeet)

andimarafioti / nano-parakeet

Pure-PyTorch Parakeet TDT inference

☆48

Alternatives and similar repositories for nano-parakeet

Users that are interested in nano-parakeet are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zhu-han / SpeechLLM
View on GitHub
LLM-based ASR recipe with Zipformer encoder and Qwen LLM
☆34Sep 25, 2025Updated 9 months ago
LilDevsy0117 / Ultra-Sortformer
View on GitHub
Ultra-Sortformer for Scalable Speaker Diarization
☆27Apr 9, 2026Updated 3 months ago
alefiury / SE-R-2022-SER-Track
View on GitHub
Code for the winning solution in the SE&R 2022 Challenge - SER track.
☆16Mar 28, 2023Updated 3 years ago
zhou-feifei / parakeet-tdt-0.6b-v2-Batch-Transcriber
View on GitHub
A high-performance batch audio transcription tool using nvidia/parakeet-tdt-0.6b-v2 to generate accurate, well-segmented SRT subtitles, w…
☆18Dec 9, 2025Updated 7 months ago
JusperLee / Gull-Codec-Training
View on GitHub
☆12Mar 11, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
bagustris / s3prl-ser
View on GitHub
S3PRL for Speech Emotion Recognition (see s3prl > downstream)
☆15Feb 28, 2026Updated 4 months ago
ZQuang2202 / Zipformer_Lightning
View on GitHub
An upgrade framework for train and validate compare with icefall using Lightning.
☆16Mar 26, 2025Updated last year
Sreyan88 / ReCLAP
View on GitHub
☆33Dec 23, 2025Updated 6 months ago
BUTSpeechFIT / SOT-DiCoW
View on GitHub
Multi-talker ASR based on DiCoW with Serialized Output Training
☆20Sep 18, 2025Updated 10 months ago
BUTSpeechFIT / AMI-diarization-setup
View on GitHub
☆54Oct 17, 2023Updated 2 years ago
ductuantruong / speaker_age_estimation_ssl_study
View on GitHub
[APSIPA'22] Exploring Speaker Age Estimation on Different Self-Supervised Learning Models
☆14Oct 19, 2022Updated 3 years ago
Deep-unlearning / Finetune-Voxtral-ASR
View on GitHub
☆38Oct 10, 2025Updated 9 months ago
pkufool / simple-wer
View on GitHub
A simple command line tool to calculate WER for ASR.
☆14Oct 14, 2024Updated last year
Deep-unlearning / Finetune-Parakeet
View on GitHub
☆25Oct 22, 2025Updated 8 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ahaliassos / usr2
View on GitHub
PyTorch implementation of USR 2.0 (ICLR 2026)
☆15Apr 3, 2026Updated 3 months ago
zelaki / DisfluentFA
View on GitHub
A Weakly Supervised Forced Alignment for disluent speech
☆15Nov 12, 2023Updated 2 years ago
WhissleAI / PromptingNemo
View on GitHub
All-in-one Speech Transcription
☆11Jun 5, 2026Updated last month
audiodemo / voice-conversion
View on GitHub
Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks
☆17Aug 18, 2023Updated 2 years ago
nithinraok / MWSG_IEEE_Paper
View on GitHub
Codes for paper "Spectrogram enhancement using multiple window Savitzky Golay (MWSG) filter for robust bird sound detection" which is pub…
☆12Aug 17, 2017Updated 8 years ago
kaistmm / AlignDiT
View on GitHub
[ACM MM 2025] AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
☆24Oct 28, 2025Updated 8 months ago
kandinskylab / kvae-audio
View on GitHub
KVAE-Audio: a continuous full-band audio waveform autoencoder
☆98Jun 30, 2026Updated 2 weeks ago
FireRedTeam / FireRedVAD
View on GitHub
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, F…
☆469May 6, 2026Updated 2 months ago
nttcslab-sp / mamba-diarization
View on GitHub
Official repository for Mamba-based Segmentation Model for Speaker Diarization
☆47May 13, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
sarapapi / hearing2translate
View on GitHub
A unified evaluation suite for speech-to-text translation, covering SpeechLLMs, SFMs, and cascaded systems across diverse real-world spee…
☆32Apr 25, 2026Updated 2 months ago
line / WaveTrainerFit
View on GitHub
Official implementation of "Wave-Trainer-Fit: Neural Vocoder with Trainable Prior and Fixed-Point Iteration towards High-Quality Speech G…
☆16Feb 6, 2026Updated 5 months ago
PalabraAI / redimnet2
View on GitHub
This repository contains the official implementation and pretrained weights for the paper "ReDimNet2: Scaling Speaker Verification via Ti…
☆64Jul 9, 2026Updated last week
SWivid / AUV
View on GitHub
An All-in-One Speech, Sound, Music Codec with Single Nested Codebook
☆28Oct 11, 2025Updated 9 months ago
neuphonic / neucodec
View on GitHub
A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.
☆161Jun 22, 2026Updated 3 weeks ago
Audio-AGI / dcase2024_task9_baseline
View on GitHub
Baseline for DCASE 2024 Task 9: "Language-Queried Audio Source Separation"
☆26Mar 27, 2024Updated 2 years ago
luomingshuang / k2-speechbrain
View on GitHub
In this repository, I try to combine k2 with speechbrain to decode well and fastly.
☆16Jun 17, 2022Updated 4 years ago
AmphionTeam / FlexiCodec
View on GitHub
[ICLR2026] FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
☆50Jul 1, 2026Updated 2 weeks ago
Aratako / CALM-DACVAE
View on GitHub
An attempt to reproduce CALM (Continuous Audio Language Models) using DACVAE as the audio VAE.
☆17Feb 20, 2026Updated 5 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
llm-jp / llm-jp-moshi
View on GitHub
LLM-jp-Moshi: Japanese Full-duplex Spoken Dialogue Models
☆21Mar 17, 2026Updated 4 months ago
ajd12342 / paraspeechclap
View on GitHub
Codebase for 'ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining'
☆23Jun 20, 2026Updated 3 weeks ago
egruttadauria98 / SSpaVAlDo
View on GitHub
☆37Jan 6, 2026Updated 6 months ago
Blinorot / utmos-pytorch
View on GitHub
Unofficial fairseq-free PyTorch implementation of UTMOS (v1, 2022), matching the original system.
☆35Jun 6, 2026Updated last month
JaeBinCHA7 / DEMUCS-for-Speech-Enhancement
View on GitHub
We implemented the DEMUCS model for speech enhancement in the time-frequency domain, and additionally implemented HD-DEMUCS.
☆34Nov 8, 2023Updated 2 years ago
NTRLab / MediaSpeech
View on GitHub
☆22Jul 22, 2022Updated 3 years ago
smulelabs / windowed-roformer
View on GitHub
Official Repository for "Efficient Vocal Source Separation Through Windowed RoFormer"
☆45Oct 30, 2025Updated 8 months ago