laboroai/Laboro-ParaCorpus

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/laboroai/Laboro-ParaCorpus)

laboroai / Laboro-ParaCorpus

Scripts for creating a Japanese-English parallel corpus and training NMT models

☆19

Alternatives and similar repositories for Laboro-ParaCorpus

Users that are interested in Laboro-ParaCorpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mynlp / niilc-qa
View on GitHub
NIILC QA data
☆18Nov 20, 2015Updated 10 years ago
tsuruoka-lab / AMI-Meeting-Parallel-Corpus
View on GitHub
AMI Meeting Parallel Corpus
☆13Dec 11, 2020Updated 5 years ago
shyyhs / CourseraParallelCorpusMining
View on GitHub
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation
☆15Aug 27, 2024Updated last year
Takeuchi-Lab-LM / python_asa
View on GitHub
python版日本語意味役割付与システム（ASA）
☆22Nov 11, 2022Updated 3 years ago
MorinoseiMorizo / jparacrawl-finetune
View on GitHub
An example usage of JParaCrawl pre-trained Neural Machine Translation (NMT) models.
☆105Apr 29, 2021Updated 5 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
tsuruoka-lab / BSD
View on GitHub
The Business Scene Dialogue corpus
☆75Nov 10, 2021Updated 4 years ago
cl-tohoku / keigo_transfer_task
View on GitHub
敬語変換タスクにおける評価用データセット
☆21Nov 24, 2022Updated 3 years ago
okoge-kaz / moe-recipes
View on GitHub
Ongoing research training Mixture of Expert models.
☆22Sep 16, 2024Updated last year
nlp-waseda / comet-atomic-ja
View on GitHub
COMET-ATOMIC ja
☆31Mar 8, 2024Updated 2 years ago
mingruimingrui / fast-mosestokenizer
View on GitHub
c++ mosestokenizer
☆18Mar 13, 2024Updated 2 years ago
Mao-KU / JASS
View on GitHub
JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation (LREC2020) & Linguistically Driven Multi-Task Pr…
☆16Jan 25, 2022Updated 4 years ago
tmu-nlp / paraphrase-corpus
View on GitHub
Tokyo Metropolitan University Paraphrase Corpus (TMUP)
☆11Jun 12, 2017Updated 9 years ago
ku-nlp / JMRD
View on GitHub
Japanese Movie Recommendation Dialogue dataset
☆29Jul 19, 2022Updated 4 years ago
roeeaharoni / sprp-acl2018
View on GitHub
Source code and data for "Split and Rephrase: Better Evaluation and a Stronger Baseline"
☆15Feb 15, 2019Updated 7 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
masora1030 / eigoyurusan
View on GitHub
To be readable without enhancing english power.
☆10Jul 22, 2020Updated 6 years ago
aistairc / trf
View on GitHub
This is the repository for TRF (text readability features) publication.
☆37Aug 27, 2019Updated 6 years ago
EhimeNLP / AcademicRoBERTa
View on GitHub
☆10Sep 3, 2024Updated last year
Unbabel / smaug
View on GitHub
Python package to augment multilingual data
☆15Feb 15, 2023Updated 3 years ago
yamachu / julius4seg
View on GitHub
Juliusを使ったセグメンテーション支援ツール
☆14Feb 13, 2020Updated 6 years ago
Mrpatekful / dialogue-reinforce
View on GitHub
Training chatbot models with reinforcement learning in ParlAI.
☆17Dec 8, 2022Updated 3 years ago
retarfi / language-pretraining
View on GitHub
Pre-training Language Models for Japanese
☆50Jul 2, 2023Updated 3 years ago
octanove / shiba
View on GitHub
Pytorch implementation and pre-trained Japanese model for CANINE, the efficient character-level transformer.
☆89Nov 3, 2023Updated 2 years ago
chaojiang06 / arXivEdits
View on GitHub
Data for EMNLP 2022 paper "arXivEdits: Understanding the Human Revision Process in Scientific Writing".
☆14Sep 30, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
uhh-lt / par4Acad
View on GitHub
Paraphrasing for academic texts
☆15Dec 8, 2022Updated 3 years ago
google-research-datasets / c4repset
View on GitHub
C4RepSet: Representative Subset from C4 data for Training Pre-trained LMs
☆11Jan 13, 2023Updated 3 years ago
shihono / evaluate_japanese_w2v
View on GitHub
script to evaluate pre-trained Japanese word2vec model on Japanese similarity dataset
☆12Nov 4, 2024Updated last year
ku-nlp / bertknp
View on GitHub
A Japanese dependency parser based on BERT
☆23Oct 26, 2022Updated 3 years ago
frontainer / kuromoji-js-dictionary
View on GitHub
kuromoji.js dictionary generator
☆17Dec 4, 2019Updated 6 years ago
SHAREVOX / sharevox_training
View on GitHub
無料で使える、声を作れるテキスト読み上げソフトウェア、SHAREVOXの音声ライブラリ学習機構
☆17Sep 21, 2023Updated 2 years ago
ikegami-yukino / dataset-list
View on GitHub
lists of text corpus and more (mainly Japanese)
☆119Jul 25, 2024Updated 2 years ago
eyalbd2 / Semantically-Driven-Sentence-Fusion
View on GitHub
Official code for the paper "Semantically Driven Sentence Fusion: Modeling and Evaluation".
☆12Aug 26, 2021Updated 4 years ago
SHAREVOX / sharevox_engine
View on GitHub
無料で使える、声を作れるテキスト読み上げソフトウェア、SHAREVOXの音声合成エンジン
☆14Sep 18, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
manzoku23 / PersonaGeneration
View on GitHub
Create Persona dataset from reddit en movie category comment
☆11Aug 6, 2021Updated 4 years ago
thammegowda / mtdata
View on GitHub
A tool that locates, downloads, and extracts machine translation corpora
☆167Apr 13, 2026Updated 3 months ago
reiyw / pdf2sb
View on GitHub
View presentation slides in Scrapbox
☆15Jun 5, 2025Updated last year
hppRC / defsent
View on GitHub
DefSent: Sentence Embeddings using Definition Sentences
☆23Aug 5, 2021Updated 4 years ago
rpryzant / JESC
View on GitHub
A large parallel corpus of English and Japanese
☆90Nov 1, 2017Updated 8 years ago
jonnyli1125 / gector-ja
View on GitHub
BERT-based GEC tagging for Japanese
☆19Aug 4, 2023Updated 2 years ago
megagonlabs / holobench
View on GitHub
🫧 Code for Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data (Maekawa*, Iso* et al.…
☆12Feb 25, 2025Updated last year