evelynkyl / yue_nmt
Python scripts and datasets of the "Extremely Low-Resource Neural Machine Translation: A Case Study of Cantonese" project
☆14Updated last year
Related projects: ⓘ
- cantonese-mandarin unsupervised neural translation for sw project☆24Updated last year
- An audio and transcribed corpus of contemporary Hong Kong Cantonese☆34Updated 3 years ago
- Code for "Planning and Generating Natural and Diverse Disfluent Texts as Augmentation for Disfluency Detection"☆15Updated 2 years ago
- Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together☆42Updated last year
- SHAS: Approaching optimal Segmentation for End-to-End Speech Translation☆37Updated last year
- BERT Tokenizer with vocabulary tailored for Cantonese☆18Updated last year
- Repository containing the open source code of works published at the FBK MT unit.☆41Updated 2 months ago
- Fine-tuning Wav2Vec2.0 on Common Voice(zh-HK)☆14Updated 2 years ago
- Improving Disfluency Detection by Self-Training a Self-Attentive Model☆47Updated 3 years ago
- Code for AAAI 2021 paper "Lexically Constrained Neural Machine Translation with Explicit Alignment Guidance"☆24Updated last year
- The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models☆24Updated 2 years ago
- Hong Kong Cantonese Corpus of transcribed speech (spontaneous speech, radio programmes and a monologue).☆39Updated 6 months ago
- Caucasus languages focused multilingual and monolingual corpuses for Natural Language Processing(NLP)☆20Updated this week
- ☆32Updated 3 years ago
- Whisper_MCE☆13Updated 3 months ago
- ☆54Updated last year
- This repository is the implementation of the HiPAMA architecture, introduced in the paper, Hierarchical Pronunciation Assessment with Mul…☆27Updated 4 months ago
- Asian language bart models (En, Ja, Ko, Zh, ECJK)☆65Updated 3 years ago
- This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalenc…☆52Updated last month
- ☆36Updated 2 years ago
- Complimentary code for our paper Automatic punctuation restoration with BERT models☆48Updated 10 months ago
- Disfluency Detection using Auto-Correlational Neural Networks☆40Updated 3 years ago
- An open-access corpus of conversational bilingual speech in Cantonese and English☆40Updated 2 years ago
- Dataset for TALLIP2019 paper "Ancient-Modern Chinese Translation with a New Large Training Dataset"☆21Updated 2 years ago
- Dictionary of pairs of Korean word and IPA crawled from Wiktionary (Korean edition)☆18Updated last year
- The case study and multilingfual performance of ICASSP submission☆20Updated last year
- Bicleaner fork that uses neural networks☆37Updated last month
- ☆24Updated 4 years ago
- Unsupervised spoken sentence embeddings☆14Updated last year