Randomly sample lines from massive text files efficiently
☆17Apr 1, 2015Updated 11 years ago
Alternatives and similar repositories for subsample
Users that are interested in subsample are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆14Jan 24, 2017Updated 9 years ago
- BERT models with tokenization for Japanese texts.☆14Nov 15, 2019Updated 6 years ago
- Text readability metrics in Python.☆11Aug 29, 2013Updated 12 years ago
- Repository collecting resources and best practices to improve experimental rigour in deep learning research.☆27Mar 30, 2023Updated 3 years ago
- Extensions to torch distributions☆19Apr 22, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Rust python bindings for symspell☆21Dec 25, 2023Updated 2 years ago
- Symmetrized word alignment models, based on mgizapp and GIZA++☆14Jun 23, 2014Updated 11 years ago
- a ducttape workflow for neural machine translation☆14Mar 23, 2021Updated 5 years ago
- Code and data for: Low Resource Grammatical Error Correction Using Wikipedia Edits (WNUT 2018)☆17Jul 16, 2024Updated last year
- A command line tool to display TSV data in tree-like format☆13Jun 19, 2021Updated 4 years ago
- Compile-time Hash Map for C++☆16Dec 25, 2022Updated 3 years ago
- Experimental collections library☆14Mar 27, 2019Updated 7 years ago
- ☆19Aug 7, 2017Updated 8 years ago
- D implementation of xxhash☆17Oct 1, 2025Updated 6 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Jupyter wire protocol implementation enabling D plugins to be jupyter kernels☆13Apr 9, 2021Updated 5 years ago
- A small library and cli to bypass code.dlang.org in a way transparent to dub☆13Oct 28, 2024Updated last year
- Word Sense Induction with BERT MLM☆28Jul 6, 2023Updated 2 years ago
- Statistical mixed effects models in Ruby☆21Jul 8, 2016Updated 9 years ago
- Word sense disambiguation test sets for NMT☆20Dec 3, 2020Updated 5 years ago
- D header for librdkafka☆10Jun 10, 2019Updated 6 years ago
- Script to get ACL Anthology☆16Jan 2, 2025Updated last year
- Data Analytics Library for Python☆17Mar 24, 2026Updated 2 weeks ago
- A plotting library in Ruby built on top of Vega and D3.☆42Jun 22, 2025Updated 9 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- This repo contains the code needed to run the R package Autotuner. Autotuner is used to identify proper parameters during metabolomics da…☆16Jan 21, 2021Updated 5 years ago
- [!Deprecated!] Docker image of Cloud9 WebIDE,请移步 https://github.com/xczh/dockerfiles/tree/master/code-server☆23Oct 6, 2019Updated 6 years ago
- Emacs minor mode that automatically demangles C++, D, and Rust symbols☆23Aug 22, 2021Updated 4 years ago
- Inverted file indexing and retrieval optimized for short texts. Supports auto-suggest and query segment classification.☆34Jun 12, 2023Updated 2 years ago
- Pandoc filter to support vimwiki-special markup☆16Jun 22, 2024Updated last year
- A package with different implementations of weighted random sampling without replacement in R☆20Mar 17, 2026Updated 3 weeks ago
- A script for rapidly sampling a proportion of lines from a file☆19Feb 5, 2026Updated 2 months ago
- ☆12Nov 25, 2018Updated 7 years ago
- Cosmin Bonchis's enhancements to the Ruby "Vector" and "Matrix" module and includes: LU and QR (Householder, Givens, Gram Schmidt, Hessen…☆33May 8, 2015Updated 10 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- D-language high-level wrapper for GNU MP (GMP) library☆17Mar 12, 2026Updated 3 weeks ago
- A unit test framework for MIT Scheme in the jUnit style.☆14Nov 30, 2015Updated 10 years ago
- Ipython/Jupyter magic for inline D code☆20Apr 26, 2023Updated 2 years ago
- Semi-Markov Afterstate Actor-Critic (SMAAC) with Maze☆11Nov 16, 2021Updated 4 years ago
- ☆26Jul 30, 2024Updated last year
- Contrastive evaluation of pronoun translation in neural machine translation☆26Aug 22, 2019Updated 6 years ago
- Tokyo Metropolitan University Paraphrase Corpus (TMUP)☆11Jun 12, 2017Updated 8 years ago