khiajohnson/SpiCE-Corpus

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/khiajohnson/SpiCE-Corpus)

khiajohnson / SpiCE-Corpus

An open-access corpus of conversational bilingual speech in Cantonese and English

☆40

Alternatives and similar repositories for SpiCE-Corpus

Users that are interested in SpiCE-Corpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

google-research-datasets / uninum
View on GitHub
A database of number names for 186 languages, locales, and scripts
☆67Mar 3, 2023Updated 3 years ago
indiejoseph / hkcc-corpus
View on GitHub
《香港二十世紀中期粵語語料庫》打包器
☆16Apr 12, 2016Updated 10 years ago
jeongukjae / namuwiki-corpus
View on GitHub
문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.
☆19Jun 16, 2021Updated 5 years ago
charlesliucn / LanMIT
View on GitHub
📖 LanMIT: A Toolkit for Improving Language Models in Low-resourced Speech Recognition based on Kaldi.
☆22Jul 12, 2019Updated 7 years ago
vadimkantorov / inferspeech
View on GitHub
PyTorch speech2text inference script for the NVidia openseq2seq wav2letter model variant
☆10Aug 12, 2019Updated 6 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
whmnoe4j / work12
View on GitHub
早期的计算机使用7位的ASCII编码，为了处理汉字，程序员设计了用于简体中文的GB2312和用于繁体中文的big5。 GB2312(1980年)一共收录了7445个字符，包括6763个汉字和682个其它符号。汉字区的内码范围高字节从B0-F7，低字节从A1-FE，占用的码…
☆10Sep 10, 2017Updated 8 years ago
achaudhury / shmoop-corpus
View on GitHub
The Shmoop Corpus
☆17Oct 27, 2020Updated 5 years ago
MrBananaHuman / KoGPT2ForParaphrasing
View on GitHub
TEMP
☆34Apr 2, 2020Updated 6 years ago
Kyubyong / KoParadigm
View on GitHub
KoParadigm: Korean Inflectional Paradigm Generator
☆60Nov 23, 2022Updated 3 years ago
Victorwz / tod_as_nlg
View on GitHub
Official implementation of SIGIR 2022 Paper "Task-Oriented Dialogue System as Natural Language Generation".
☆14Apr 6, 2022Updated 4 years ago
choe-hyonsu-gabrielle / korean-amr-corpus
View on GitHub
Korean Abstract Meaning Representation (AMR) Corpus
☆10Feb 27, 2022Updated 4 years ago
CanCLID / awesome-cantonese-nlp
View on GitHub
A curated list of resources dedicated to Natural Language Processing (NLP) of Cantonese | 粵語 NLP
☆95Oct 17, 2021Updated 4 years ago
satya77 / Entity_Embedding
View on GitHub
Reference implementation of the paper "Word Embeddings for Entity-annotated Texts"
☆18Apr 12, 2019Updated 7 years ago
Yangyangii / TPGST-Tacotron
View on GitHub
Google's TPGST reimplementation.
☆34Dec 11, 2019Updated 6 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Kaleidophon / token2index
View on GitHub
A lightweight but powerful library to build token indices for NLP tasks, compatible with major Deep Learning frameworks like PyTorch and …
☆50Dec 6, 2024Updated last year
yseokchoi / SejongTree2Dependency
View on GitHub
세종 구문 분석 말뭉치의 의존 구문 구조로의 변환 도구
☆10Sep 7, 2018Updated 7 years ago
warnikchow / prosem
View on GitHub
Prosody-semantics Interface in Seoul Korean
☆12Oct 9, 2020Updated 5 years ago
Victorwz / zs-nmt-dae
View on GitHub
Official implementation of EMNLP 2021 Paper "Rethinking Zero-shot Neural Machine Translation: From a Perspective of Latent Variables"
☆12May 15, 2023Updated 3 years ago
charlesXu86 / char_featurizer
View on GitHub
汉字字符特征提取工具，可以提取出字符中的字音（声母、韵母、声调）、字形（偏旁、部首）、四角编码等特征，同时可作为tensor输入到模型
☆138May 25, 2020Updated 6 years ago
ModuNLP / hacking_transformers
View on GitHub
☆11Aug 12, 2020Updated 5 years ago
ChiYeungLaw / Awsome-Cross-Lingual-Word-Embeddings
View on GitHub
Papers, code and datasets about Cross-lingual Word Embeddings
☆21Jan 23, 2022Updated 4 years ago
cifkao / ismir2019-music-style-translation
View on GitHub
The code for the ISMIR 2019 paper “Supervised symbolic music style translation using synthetic data”.
☆28Nov 21, 2022Updated 3 years ago
berniey / hanziconv
View on GitHub
Hanzi Converter for Traditional and Simplified Chinese
☆190Mar 28, 2020Updated 6 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Gyeongmin47 / KoCHET-A-Korean-Cultural-Heritage-corpus-for-Entity-related-Tasks
View on GitHub
☆13Nov 30, 2022Updated 3 years ago
PunkMale / OR-Gate
View on GitHub
Official PyTorch implementation of the paper "Robust Training for Speaker Verification against Noisy Labels" in INTERSPEECH 2023.
☆12Oct 23, 2023Updated 2 years ago
naver-ai / neuralwoz
View on GitHub
NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)
☆36Jul 22, 2021Updated 5 years ago
junekihong / beam-span-parser
View on GitHub
A DP beam-search extension of Mitchell Stern's span-based neural constituency parser
☆11Aug 24, 2022Updated 3 years ago
Idlak / Living-Audio-Dataset
View on GitHub
A "Crowd-Built" continuously growing speech dataset with transcripts. The dataset contains multiple languages and is intended for anyone …
☆43Aug 3, 2022Updated 3 years ago
amazon-science / proteno
View on GitHub
This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to…
☆45May 25, 2021Updated 5 years ago
zhangyics / Chinese-abbreviation-dataset
View on GitHub
This is a corpus of Chinese abbreviation, including negative full forms.
☆198Jul 17, 2021Updated 5 years ago
j-min / Easy-Namuwiki-Extractor
View on GitHub
Easy Namuwiki Extractor
☆29Nov 29, 2016Updated 9 years ago
zeeeyang / two-local-neural-conparsers
View on GitHub
Span and Rule Models for Neural Constituent Parsing
☆10Jun 11, 2018Updated 8 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
jacksonllee / pycantonese
View on GitHub
Cantonese Linguistics and NLP
☆413May 26, 2026Updated 2 months ago
yafuly / CoGnition
View on GitHub
☆17Nov 10, 2021Updated 4 years ago
duyichao / E2E-ST-TDA
View on GitHub
Official implementation of AAAI'2022 paper "Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement"
☆17Dec 23, 2021Updated 4 years ago
songys / single_turn_dialogue
View on GitHub
사전에서 대화 예문만 추출한 데이터
☆16Apr 24, 2023Updated 3 years ago
LG-1 / video_music_book_datasets
View on GitHub
NLP NER datasets video/music/book bio
☆90Jan 3, 2021Updated 5 years ago
dynilib / dynitag
View on GitHub
Collaborative audio annotation tool
☆17Sep 16, 2022Updated 3 years ago
SKTBrain / KVQA
View on GitHub
Korean Visual Question Answering
☆59Feb 18, 2020Updated 6 years ago