shyyhs/CourseraParallelCorpusMining

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/shyyhs/CourseraParallelCorpusMining)

shyyhs / CourseraParallelCorpusMining

Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation

☆15

Alternatives and similar repositories for CourseraParallelCorpusMining

Users that are interested in CourseraParallelCorpusMining are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SAP / software-documentation-data-set-for-machine-translation
View on GitHub
A parallel evaluation data set of SAP software documentation with document structure annotation
☆15Jun 12, 2026Updated last month
laboroai / Laboro-ParaCorpus
View on GitHub
Scripts for creating a Japanese-English parallel corpus and training NMT models
☆19Nov 9, 2021Updated 4 years ago
browsermt / students
View on GitHub
Efficient teacher-student models and scripts to make them
☆57Dec 16, 2023Updated 2 years ago
mynlp / niilc-qa
View on GitHub
NIILC QA data
☆18Nov 20, 2015Updated 10 years ago
ZurichNLP / domain-robustness
View on GitHub
☆13Dec 11, 2020Updated 5 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
hangyav / UnsupPSE
View on GitHub
Unsupervised parallel sentence extraction from comparable corpora
☆16Aug 6, 2019Updated 6 years ago
viswavi / languageid
View on GitHub
Identifying the language of input text using character-level n-grams, with support for 45 languages
☆11Dec 26, 2022Updated 3 years ago
ZurichNLP / nmtscore
View on GitHub
A library of translation-based text similarity measures
☆25Dec 11, 2023Updated 2 years ago
ku-nlp / text-cleaning
View on GitHub
A powerful text cleaner for Japanese web texts
☆12Jan 20, 2024Updated 2 years ago
NYUCCL / duolingoSLAM
View on GitHub
2018 Duolingo Shared Task on Second Language Acquisition Modeling (SLAM) (http://sharedtask.duolingo.com/)
☆12May 31, 2018Updated 8 years ago
masayu-a / WLSP-familiarity
View on GitHub
Word Familiarity Rate for 'Word List by Semantic Principles (WLSP)'
☆12Jan 2, 2025Updated last year
carina-kauf / better-mlm-scoring
View on GitHub
[Kauf & Ivanova, ACL 2023] A Better Way to Do Masked Language Model Scoring
☆12Dec 1, 2023Updated 2 years ago
ku-nlp / bertknp
View on GitHub
A Japanese dependency parser based on BERT
☆23Oct 26, 2022Updated 3 years ago
giellalt / lang-crk
View on GitHub
Finite state and Constraint Grammar based analysers and proofing tools, and language resources for the Plains Cree language
☆16Jun 3, 2026Updated last month
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
LivingSkyTechnologies / Dense_Article_Dataset_DAD
View on GitHub
Dense Article Dataset (DAD): A Benchmark Dataset for Document Layout Analysis
☆16Jan 13, 2022Updated 4 years ago
writecrow / text_processing
View on GitHub
A repository for text_processing tools used by crow
☆12Mar 21, 2025Updated last year
megagonlabs / ginza-transformers
View on GitHub
Use custom tokenizers in spacy-transformers
☆16Aug 9, 2022Updated 3 years ago
akirakubo / bert-japanese-aozora
View on GitHub
Japanese BERT trained on Aozora Bunko and Wikipedia, pre-tokenized by MeCab with UniDic & SudachiPy
☆40Aug 8, 2020Updated 5 years ago
emorynlp / ddr
View on GitHub
Deep Dependency Representation
☆16May 9, 2018Updated 8 years ago
ku-nlp / KWDLC
View on GitHub
Kyoto University Web Document Leads Corpus
☆84Dec 18, 2023Updated 2 years ago
roeeaharoni / string-to-tree-nmt
View on GitHub
Source code and data for the paper "Towards String-to-Tree Neural Machine Translation"
☆16Dec 31, 2017Updated 8 years ago
KKodiac / Covid19_Stats
View on GitHub
코로나-19 에 대한 확진/완치/사망 에 대한 국내, 해외 정보를 수집합니다. Data scrapes Covid-19 Confirmed/Cured/Deceases Cases.
☆10Jun 6, 2021Updated 5 years ago
tmu-nlp / TwitterCorpus
View on GitHub
首都大日本語 Twitter コーパス
☆21Mar 14, 2016Updated 10 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Takeuchi-Lab-LM / python_asa
View on GitHub
python版日本語意味役割付与システム（ASA）
☆22Nov 11, 2022Updated 3 years ago
salesforce / localization-xml-mt
View on GitHub
A High-Quality Multilingual Dataset for Structured Documentation Translation
☆39May 1, 2025Updated last year
rainarch / ChunkLinkCTB
View on GitHub
A tool for extracting chunks from Penn Chinese Treebank
☆18Jan 12, 2018Updated 8 years ago
zesch / lang-tech-teaching-public
View on GitHub
☆16Jan 24, 2022Updated 4 years ago
ryanmcd / uni-dep-tb
View on GitHub
A set of treebanks for multiple languages annotated in basic Stanford-style dependencies.
☆68Aug 29, 2019Updated 6 years ago
JieyuZ2 / doc2graph
View on GitHub
Code for the paper "Neural Concept Map Generation for Effective Document Classification with Interpretable Structured Summarization" SIGI…
☆20Jan 10, 2021Updated 5 years ago
toasted-nutbread / yomichan-bccwj-frequency-dictionary
View on GitHub
Script to create a frequency dictionary for Yomichan
☆18Sep 11, 2022Updated 3 years ago
bestian / q-moedict
View on GitHub
備用的萌典(moedict pwa & app, Quasar used)
☆12Jul 31, 2025Updated 11 months ago
masakhane-io / africomet
View on GitHub
COMET for African languages
☆11Jan 24, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Taiwanese-Corpus / Ogawa-Naoyoshi_1931-1932
View on GitHub
台日大辭典台語譯本
☆11Jul 12, 2016Updated 10 years ago
bicici / FDA
View on GitHub
Feature Decay Algorithms
☆11Mar 5, 2014Updated 12 years ago
ARBML / dar
View on GitHub
A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.
☆11Jun 23, 2024Updated 2 years ago
ltgoslo / factorizer
View on GitHub
☆16May 14, 2024Updated 2 years ago
banyh / PyStanfordNLP
View on GitHub
A Python Wrapper of Stanford Chinese Segmenter
☆20Aug 2, 2017Updated 8 years ago
dustinvtran / blog
View on GitHub
All code and content for my blog.
☆15Sep 23, 2018Updated 7 years ago
shahparth123 / eng_guj_parallel_corpus
View on GitHub
This repository contains dataset for english to gujarati translation
☆10Dec 27, 2020Updated 5 years ago