clab/wikipedia-parallel-titles

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/clab/wikipedia-parallel-titles)

clab / wikipedia-parallel-titles

Tools for extracting parallel corpora from article titles across languages in Wikipedia

☆74

Alternatives and similar repositories for wikipedia-parallel-titles

Users that are interested in wikipedia-parallel-titles are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

chrishokamp / dl4mt_exercises
View on GitHub
Lab exercises for the DL4MT winter school at DCU
☆15Oct 21, 2015Updated 10 years ago
roeeaharoni / dynmt-py
View on GitHub
Neural machine translation implementation using dynet's python bindings
☆17Jan 24, 2018Updated 8 years ago
ucam-smt / sgnmt
View on GitHub
Decoding platform for machine translation research
☆54Aug 24, 2019Updated 6 years ago
neulab / covid19-datashare
View on GitHub
A repo for sharing language resources related to the outbreak (in machine readable format)
☆25Sep 22, 2025Updated 9 months ago
rsennrich / lingeval97
View on GitHub
☆18Oct 5, 2017Updated 8 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
travisbrown / mstparser
View on GitHub
☆17Apr 19, 2016Updated 10 years ago
coastalcph / rungsted
View on GitHub
Fast structured perceptron sequential labeler
☆15Dec 8, 2015Updated 10 years ago
sean-chester / generalised-brown
View on GitHub
C++ implementation of Generalised Brown clustering and python scripts for feature generation
☆41Apr 8, 2016Updated 10 years ago
marliesvanderwees / dds-nmt
View on GitHub
Dynamic data selection for neural machine translation
☆20Jan 28, 2018Updated 8 years ago
karlmoritz / bicvm
View on GitHub
BiCVM Code
☆45May 14, 2018Updated 8 years ago
prasastoadi / parallel-corpora-en-id
View on GitHub
English - Indonesian parallel corpora
☆17Aug 6, 2018Updated 7 years ago
dbpedia-spotlight / evaluation-datasets
View on GitHub
Will store links to known evaluation datasets alongside stats to characterize them
☆24Mar 9, 2016Updated 10 years ago
jiyfeng / entitynlm
View on GitHub
☆44Nov 30, 2017Updated 8 years ago
lmthang / bivec
View on GitHub
Train bilingual embeddings as described in our NAACL 2015 workshop paper "Bilingual Word Representations with Monolingual Quality in Mind…
☆79Jun 15, 2019Updated 7 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
peoplepattern / lib-text
View on GitHub
A little text processing library for Scala.
☆28Mar 3, 2016Updated 10 years ago
tnq177 / witwicky
View on GitHub
Witwicky: An implementation of Transformer in PyTorch.
☆22Aug 17, 2020Updated 5 years ago
braunefe / Gargantua
View on GitHub
☆12Dec 9, 2015Updated 10 years ago
vchahun / galechurch
View on GitHub
Gale&Church (1993) sentence alignment
☆16May 9, 2020Updated 6 years ago
taolei87 / RBGParser
View on GitHub
Graph-based Dependency Parser
☆47Jan 25, 2016Updated 10 years ago
facebookresearch / evaluation-of-nmt-bt
View on GitHub
This repository contains additional reference translations for the WMT'14 En-De (newstest2014) and WMT'19 En-Ru (newstest2019) test sets …
☆15Aug 31, 2021Updated 4 years ago
cfedermann / Appraise
View on GitHub
Appraise evaluation system for manual evaluation of machine translation output
☆77May 7, 2021Updated 5 years ago
asaluja / spectral-scfg
View on GitHub
Latent-variable Synchronous Context-Free Grammar Toolkit
☆10Sep 30, 2014Updated 11 years ago
jonsafari / nmt-list
View on GitHub
A list of Neural MT implementations
☆364Jul 27, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
isi-nlp / Zoph_RNN
View on GitHub
C++/CUDA toolkit for training sequence and sequence-to-sequence models across multiple GPUs
☆185May 15, 2017Updated 9 years ago
ldmt-muri / alignment-with-openfst
View on GitHub
☆21Dec 9, 2016Updated 9 years ago
jonsafari / tok-tok
View on GitHub
A fast, simple, multilingual tokenizer
☆29May 24, 2017Updated 9 years ago
gregdurrett / berkeley-entity
View on GitHub
The Berkeley Entity Resolution System jointly solves the problems of named entity recognition, coreference resolution, and entity linking…
☆188Dec 7, 2019Updated 6 years ago
jonsafari / clustercat
View on GitHub
Fast Word Clustering Software
☆79Feb 8, 2025Updated last year
tastyminerals / ccrawl
View on GitHub
Simple CORPORA list crawler
☆11Dec 2, 2016Updated 9 years ago
alvations / lazyme
View on GitHub
Lazy python recipes.
☆10Apr 17, 2026Updated 3 months ago
ajinkyakulkarni14 / How-I-Extracted-TED-talks-for-parallel-Corpus-
View on GitHub
☆34Nov 29, 2016Updated 9 years ago
Unbabel / BConTrasT
View on GitHub
☆20Aug 17, 2021Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
salesforce / localization-xml-mt
View on GitHub
A High-Quality Multilingual Dataset for Structured Documentation Translation
☆39May 1, 2025Updated last year
raymondhs / fairseq-laser
View on GitHub
My implementation of LASER architecture in Fairseq
☆12Oct 6, 2020Updated 5 years ago
bitextor / bicleaner
View on GitHub
Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
☆160Jun 18, 2024Updated 2 years ago
qipeng / wikiextractor
View on GitHub
A tool for extracting plain text from Wikipedia dumps
☆15Sep 13, 2018Updated 7 years ago
jhclark / multeval
View on GitHub
Easy Bootstrap Resampling and Approximate Randomization for BLEU, METEOR, and TER using Multiple Optimizer Runs. This implements "Better …
☆205Feb 25, 2023Updated 3 years ago
EdinburghNLP / nematus
View on GitHub
Open-Source Neural Machine Translation in Tensorflow
☆805Dec 9, 2022Updated 3 years ago
rsennrich / Bleualign
View on GitHub
Machine-Translation-based sentence alignment tool for parallel text
☆316Mar 18, 2021Updated 5 years ago