Tools for extracting parallel corpora from article titles across languages in Wikipedia
☆74Feb 25, 2015Updated 11 years ago
Alternatives and similar repositories for wikipedia-parallel-titles
Users that are interested in wikipedia-parallel-titles are comparing it to the libraries listed below
Sorting:
- Lab exercises for the DL4MT winter school at DCU☆15Oct 21, 2015Updated 10 years ago
- ☆44Nov 30, 2017Updated 8 years ago
- Neural machine translation implementation using dynet's python bindings☆17Jan 24, 2018Updated 8 years ago
- Fast structured perceptron sequential labeler☆15Dec 8, 2015Updated 10 years ago
- Efficient Markov Chain word alignment☆53Aug 1, 2021Updated 4 years ago
- Decoding platform for machine translation research☆54Aug 24, 2019Updated 6 years ago
- ☆12Dec 9, 2015Updated 10 years ago
- A repo for sharing language resources related to the outbreak (in machine readable format)☆25Sep 22, 2025Updated 5 months ago
- A little text processing library for Scala.☆28Mar 3, 2016Updated 10 years ago
- C++ implementation of Generalised Brown clustering and python scripts for feature generation☆41Apr 8, 2016Updated 9 years ago
- Cynical data selection☆20Jan 16, 2021Updated 5 years ago
- Fast C++ implementation of multiple prototype word representation training based on Huang Socher 2012☆21May 10, 2016Updated 9 years ago
- Dynamic data selection for neural machine translation☆20Jan 28, 2018Updated 8 years ago
- Fast Word Clustering Software☆79Feb 8, 2025Updated last year
- A High-Quality Multilingual Dataset for Structured Documentation Translation☆37May 1, 2025Updated 10 months ago
- Witwicky: An implementation of Transformer in PyTorch.☆22Aug 17, 2020Updated 5 years ago
- Simple CORPORA list crawler☆10Dec 2, 2016Updated 9 years ago
- Gale&Church (1993) sentence alignment☆16May 9, 2020Updated 5 years ago
- ☆21Dec 9, 2016Updated 9 years ago
- Named Entity Disambiguation for Noisy Text☆66Jun 26, 2017Updated 8 years ago
- Graph-based Dependency Parser☆46Jan 25, 2016Updated 10 years ago
- C++/CUDA toolkit for training sequence and sequence-to-sequence models across multiple GPUs☆186May 15, 2017Updated 8 years ago
- Lazy python recipes.☆10Apr 17, 2021Updated 4 years ago
- Data collection, alignment and TAUS repository☆23Nov 30, 2017Updated 8 years ago
- BiCVM Code☆45May 14, 2018Updated 7 years ago
- The Berkeley Entity Resolution System jointly solves the problems of named entity recognition, coreference resolution, and entity linking…☆187Dec 7, 2019Updated 6 years ago
- cicada: a hypergraph-based toolkit for statistical machine translation based on {tree, string}-to-{tree, string} models☆42Aug 9, 2021Updated 4 years ago
- Multilingual image description☆45Feb 9, 2018Updated 8 years ago
- Example project showing how you can use your fast.ai based scripts to let Amazon SageMaker perform the training and hosting of your model…☆14Feb 20, 2019Updated 7 years ago
- Crawling engine that crawls a set of top-level domains looking for documents in a list of languages☆11Feb 6, 2024Updated 2 years ago
- Official code and data of "3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset"☆12Dec 8, 2024Updated last year
- Latent-variable Synchronous Context-Free Grammar Toolkit☆10Sep 30, 2014Updated 11 years ago
- Efficient Low-Memory Aligner☆146Jan 15, 2025Updated last year
- A BiRNN framework implemented in Python and TensorFlow to extract parallel sentences from aligned comparable corpora.☆33Sep 4, 2018Updated 7 years ago
- ☆13Aug 20, 2021Updated 4 years ago
- ☆14Apr 12, 2017Updated 8 years ago
- Code for the paper Faster Phrase-Based Decoding by Refining Feature State☆14Jan 9, 2023Updated 3 years ago
- A tool for extracting plain text from Wikipedia dumps☆15Sep 13, 2018Updated 7 years ago
- This repository contains additional reference translations for the WMT'14 En-De (newstest2014) and WMT'19 En-Ru (newstest2019) test sets …☆15Aug 31, 2021Updated 4 years ago