An open-access corpus of conversational bilingual speech in Cantonese and English
☆40Apr 28, 2022Updated 3 years ago
Alternatives and similar repositories for SpiCE-Corpus
Users that are interested in SpiCE-Corpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A database of number names for 186 languages, locales, and scripts☆67Mar 3, 2023Updated 3 years ago
- 《香港二十世紀中期粵語語料庫》打包器☆16Apr 12, 2016Updated 10 years ago
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler☆23Mar 21, 2021Updated 5 years ago
- 📖 LanMIT: A Toolkit for Improving Language Models in Low-resourced Speech Recognition based on Kaldi.☆22Jul 12, 2019Updated 6 years ago
- PyTorch speech2text inference script for the NVidia openseq2seq wav2letter model variant☆10Aug 12, 2019Updated 6 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- 早期的计算机使用7位的ASCII编码,为了处理汉字,程序员设计了用于简体中文的GB2312和用于繁体中文的big5。 GB2312(1980年)一共收录了7445个字符,包括6763个汉字和682个其它符号。汉字区的内码范围高字节从B0-F7,低字节从A1-FE,占用的码…☆10Sep 10, 2017Updated 8 years ago
- Minimal Tensorflow Docker image with SyntaxNet/DRAGNN based on Alpine linux☆32Oct 7, 2020Updated 5 years ago
- The Shmoop Corpus☆17Oct 27, 2020Updated 5 years ago
- ☆86Mar 11, 2020Updated 6 years ago
- TEMP☆34Apr 2, 2020Updated 6 years ago
- Resources for "Simple Speech Representation Learning from Perceptual Data".☆11Sep 18, 2023Updated 2 years ago
- KoParadigm: Korean Inflectional Paradigm Generator☆57Nov 23, 2022Updated 3 years ago
- Korean Abstract Meaning Representation (AMR) Corpus☆10Feb 27, 2022Updated 4 years ago
- Official implementation of SIGIR 2022 Paper "Task-Oriented Dialogue System as Natural Language Generation".☆14Apr 6, 2022Updated 4 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)☆36Jul 22, 2021Updated 4 years ago
- [Findings of NAACL2022] A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation☆11May 27, 2022Updated 3 years ago
- A curated list of resources dedicated to Natural Language Processing (NLP) of Cantonese | 粵語 NLP☆92Oct 17, 2021Updated 4 years ago
- Reference implementation of the paper "Word Embeddings for Entity-annotated Texts"☆18Apr 12, 2019Updated 7 years ago
- This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to…☆45May 25, 2021Updated 4 years ago
- A "Crowd-Built" continuously growing speech dataset with transcripts. The dataset contains multiple languages and is intended for anyone …☆43Aug 3, 2022Updated 3 years ago
- A lightweight but powerful library to build token indices for NLP tasks, compatible with major Deep Learning frameworks like PyTorch and …☆50Dec 6, 2024Updated last year
- some tutorials for blog: simonjisu.github.io☆23Mar 25, 2021Updated 5 years ago
- Wikipedia EXhaustive Entity Annotator (LREC 2020)☆16Apr 22, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.☆13Feb 13, 2021Updated 5 years ago
- 세종 구문 분석 말뭉치의 의존 구문 구조로의 변환 도구☆10Sep 7, 2018Updated 7 years ago
- Prosody-semantics Interface in Seoul Korean☆12Oct 9, 2020Updated 5 years ago
- SOTA punctation restoration (for e.g. automatic speech recognition) deep learning model based on BERT pre-trained model☆182May 17, 2019Updated 6 years ago
- Official implementation of EMNLP 2021 Paper "Rethinking Zero-shot Neural Machine Translation: From a Perspective of Latent Variables"☆12May 15, 2023Updated 2 years ago
- ☆11Aug 12, 2020Updated 5 years ago
- 汉字字符特征提取工具,可以提取出字符中的字音(声母、韵母、声调)、字形(偏旁、部首)、四角编码等特征,同时可作为tensor输入到模型☆138May 25, 2020Updated 5 years ago
- Pumilio: A Web-Based Management System for Ecological Recordings☆13Oct 29, 2018Updated 7 years ago
- Papers, code and datasets about Cross-lingual Word Embeddings☆21Jan 23, 2022Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Hanzi Converter for Traditional and Simplified Chinese☆189Mar 28, 2020Updated 6 years ago
- g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese☆244Jul 10, 2019Updated 6 years ago
- ☆12Nov 30, 2022Updated 3 years ago
- Official PyTorch implementation of the paper "Robust Training for Speaker Verification against Noisy Labels" in INTERSPEECH 2023.☆11Oct 23, 2023Updated 2 years ago
- ☆17Jun 30, 2020Updated 5 years ago
- Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.☆262Oct 11, 2019Updated 6 years ago
- [ICLR 2022] "Audio Lottery: Speech Recognition Made Ultra-Lightweight, Noise-Robust, and Transferable", by Shaojin Ding, Tianlong Chen, Z…☆32Apr 8, 2022Updated 4 years ago