facebookresearch/llama-hd-dataset

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/facebookresearch/llama-hd-dataset)

facebookresearch / llama-hd-dataset

This is a balanced dataset for English homograph disambiguation (HD), generated with Meta's Llama 2-Chat 70B model.

☆22

Alternatives and similar repositories for llama-hd-dataset

Users that are interested in llama-hd-dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lars76 / fastspeech2-clean
View on GitHub
Clean and modernized implementation of FastSpeech2/LightSpeech using IPA
☆18Aug 16, 2024Updated last year
fakerybakery / OpenF5-TTS
View on GitHub
(WIP) A retrain of F5-TTS on permissively-licensed data
☆14Apr 6, 2025Updated last year
codebyzeb / g2p-plus
View on GitHub
Grapheme-to-phoneme tool for corpus conversion, where phonemes match Phoible inventories
☆19Apr 10, 2025Updated last year
facebookresearch / emphassess
View on GitHub
This repository presents an evaluation framework for speech-to-speech (S2S) models, following the methodology described in the EmphAsses …
☆25Jan 9, 2024Updated 2 years ago
p1an-lin-jung / wv_tts
View on GitHub
☆19Mar 22, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
poleval / 2021-punctuation-restoration
View on GitHub
PolEval 2021 Task 1
☆15Jun 28, 2022Updated 4 years ago
audiodemo / voice-conversion
View on GitHub
Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks
☆17Aug 18, 2023Updated 2 years ago
ZarahShibli / Arabic_Punctuation_Prediction
View on GitHub
Sequence to sequence model for Arabic punctuation prediction.
☆12Feb 13, 2020Updated 6 years ago
Picovoice / text-to-speech-benchmark
View on GitHub
Text-to-Speech Benchmark
☆26Apr 2, 2026Updated 3 months ago
leohuang2013 / pyannote-audio_overlapped-speech-detection_cpp
View on GitHub
C++ version of pyannote audio overlapped speech detection pipeline
☆13Feb 14, 2024Updated 2 years ago
ogunlao / glowtts_stdp
View on GitHub
Glow-TTS with Stochastic Duration Predictor and Stochastic Pitch Predictor
☆19Jun 5, 2023Updated 3 years ago
mjansche / tts-tutorial
View on GitHub
Text-to-Speech tutorial at SLTU 2016
☆35May 10, 2016Updated 10 years ago
kadirnar / fast-dacvae
View on GitHub
☆20Mar 17, 2026Updated 4 months ago
roedoejet / FastSpeech2_ACL2022_reproducibility
View on GitHub
☆21Feb 27, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
k9luo / Punctuation-Restoration
View on GitHub
A TensorFlow Implementation of Punctuation Restoration.
☆18Nov 9, 2020Updated 5 years ago
Auroraaa86 / LCS-CTC
View on GitHub
For IEEE ASRU(2025)
☆15Jun 21, 2025Updated last year
cldf / segments
View on GitHub
Unicode Standard tokenization routines and orthography profile segmentation
☆41Mar 7, 2026Updated 4 months ago
fengpeng-yue / ASRTTS
View on GitHub
ASR & TTS joint training, asr, tts, machine speech chain
☆16Oct 16, 2021Updated 4 years ago
cpii-cai / PunCantonese
View on GitHub
A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts
☆15Dec 3, 2024Updated last year
yazone / g2pE_mobile
View on GitHub
g2p for english tts
☆19Nov 10, 2022Updated 3 years ago
idiap / knn-tts
View on GitHub
Simple and lightweight Zero-shot Text-to-Speech (TTS) synthesis model
☆36Apr 29, 2025Updated last year
tabahi / contexless-phonemes-CUPE
View on GitHub
pytorch model for contexless-phoneme prediction from speech audio
☆32Oct 30, 2025Updated 8 months ago
antimora / burn-flex
View on GitHub
Portable, efficient CPU backend for Burn with SIMD, gemm, and no_std
☆17Apr 10, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wonjune-kang / lvc-vc
View on GitHub
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions
☆94Nov 6, 2023Updated 2 years ago
sigmeta / g2p-kd
View on GitHub
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion
☆20Jul 9, 2019Updated 7 years ago
iamanigeeit / present
View on GitHub
☆14Aug 19, 2024Updated last year
KdaiP / conformer-RoPE
View on GitHub
Conformer block with Rotary Position Embedding, modified from lucidrains' implement
☆19Sep 13, 2024Updated last year
k2-fsa / colab
View on GitHub
Colab notebooks for Next-gen Kaldi
☆31Oct 12, 2025Updated 9 months ago
reppy4620 / convnext_tts
View on GitHub
Unofficial implementation of ConvNeXt-TTS powered by lightning
☆18Oct 20, 2024Updated last year
5Hyeons / StyleTTS2-Vocos
View on GitHub
StyleTTS2 + Vocos as a Decoder
☆13Mar 24, 2025Updated last year
ArenAcikgoz / Whisper-Alignment
View on GitHub
Forced alignment decoder for Whisper.
☆16Mar 13, 2024Updated 2 years ago
gnp / minbpe-rs
View on GitHub
Port of Andrej Karpathy's minbpe to Rust
☆32May 6, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
pashanitw / W2V2-BERT-ASR-Training
View on GitHub
☆15Mar 25, 2024Updated 2 years ago
DDATT / Vits2-onnx-cpp
View on GitHub
Simple inference for Vits2 TTS Using ONNXRUNTIME and espeak-ng on C++
☆19Apr 17, 2024Updated 2 years ago
crystal0913 / merlin-tts
View on GitHub
c++ code for merlin tts
☆22Oct 19, 2019Updated 6 years ago
ictnlp / DST
View on GitHub
DST is a Decoder-only simultaneous machine translation model, which can conduct policy decision and translation concurrently
☆11Jun 6, 2024Updated 2 years ago
RReverser / serde-ndim
View on GitHub
Serde support for n-dimensional arrays from self-describing formats
☆13May 1, 2026Updated 2 months ago
wenet-e2e / WeSpeech-AI
View on GitHub
Open Source Speech/Text Data on AI
☆19Sep 13, 2022Updated 3 years ago
thu-spmi / CTC-TTS
View on GitHub
Code for CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment, Interspeech 2026.
☆20Jun 9, 2026Updated last month