dcjones / subsample
Randomly sample lines from massive text files efficiently
☆17Updated 10 years ago
Alternatives and similar repositories for subsample
Users that are interested in subsample are comparing it to the libraries listed below
Sorting:
- Fast Word Clustering Software☆78Updated 3 months ago
- modlm: A toolkit for mixture of distributions language models☆27Updated 7 years ago
- A re-implementation of redpony/cdec's tokenize-anything.pl script in python☆8Updated 9 years ago
- Decoding platform for machine translation research☆55Updated 5 years ago
- Appraise evaluation system for manual evaluation of machine translation output☆74Updated 4 years ago
- Training scripts and recipes for Sockeye Neural Machine Translation toolkit☆37Updated 5 years ago
- ☆18Updated 7 years ago
- Tools for extracting parallel corpora from article titles across languages in Wikipedia☆73Updated 10 years ago
- Pacaya - A Library for Hybrid Graphical Models and Neural Networks☆44Updated 7 years ago
- ☆21Updated 10 years ago
- C++ implementation of Generalised Brown clustering and python scripts for feature generation☆41Updated 9 years ago
- Neural macine translation soft alignment visualisations for web and command line☆72Updated 3 years ago
- Text Simplification System and Dataset☆122Updated last year
- Unsupervised parsing and noun phrase identification☆22Updated 11 years ago
- Text Simplification System and Dataset☆15Updated 7 years ago
- A simple CoNLL-X to tikz-dependency converter.☆20Updated 12 years ago
- Automatic extraction of edited sentences from text edition histories.☆83Updated 3 years ago
- A dataset of sentences with ordinal labels for grammaticality☆29Updated 10 years ago
- ☆17Updated 4 years ago
- Utility scripts in Python☆37Updated 8 months ago
- ☆44Updated 7 years ago
- Collection of Evaluation Metrics and Algorithms for Machine Translation☆76Updated 7 years ago
- AROW++ An implementation of the efficient confidence-weighted classifier☆11Updated 4 years ago
- SALM: Suffix Array and its Applications in Empirical Language Processing by Joy☆11Updated 7 years ago
- Transition-based UCCA Parser☆72Updated 4 years ago
- Workshop on Noisy User-generated Text (W-NUT)☆30Updated last week
- A multilingual dependency parser based on linear programming relaxations.☆115Updated 6 years ago
- Code and data for paper Colorless Green Recurrent Networks Dream Hierarchically☆92Updated 3 years ago
- Word sense disambiguation test sets for NMT☆19Updated 4 years ago
- Efficient Markov Chain word alignment☆53Updated 3 years ago