Randomly sample lines from massive text files efficiently
☆17Apr 1, 2015Updated 10 years ago
Alternatives and similar repositories for subsample
Users that are interested in subsample are comparing it to the libraries listed below
Sorting:
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆14Jan 24, 2017Updated 9 years ago
- BERT models with tokenization for Japanese texts.☆14Nov 15, 2019Updated 6 years ago
- A processor for KyotoCorpus, KWDLC, and AnnotatedFKCCorpus☆10Jun 26, 2024Updated last year
- Project OCELoT: an Open, Collaborative Evaluation Leaderboard of Translations☆23Nov 5, 2025Updated 4 months ago
- Reinforcement Learning Replications is a set of Pytorch implementations of reinforcement learning algorithms.☆24Dec 15, 2024Updated last year
- a ducttape workflow for neural machine translation☆14Mar 23, 2021Updated 4 years ago
- A command line tool to display TSV data in tree-like format☆13Jun 19, 2021Updated 4 years ago
- Experimental collections library☆14Mar 27, 2019Updated 6 years ago
- Implementation of Content Defined Chunking (CDC) in D☆11May 10, 2023Updated 2 years ago
- D implementation of xxhash☆17Oct 1, 2025Updated 5 months ago
- Simple associative array implementation for D (-betterC) that fits my needs.☆14Mar 22, 2022Updated 3 years ago
- A small library and cli to bypass code.dlang.org in a way transparent to dub☆13Oct 28, 2024Updated last year
- GMEG☆31Nov 21, 2024Updated last year
- ☆16Oct 24, 2017Updated 8 years ago
- A comparison of Nim's performance against the "Faster Command Line Tools in D" blog post found here: http://dlang.org/blog/2017/05/24/fas…☆14Mar 31, 2018Updated 7 years ago
- D header for librdkafka☆10Jun 10, 2019Updated 6 years ago
- ☆18Oct 5, 2017Updated 8 years ago
- A plotting library in Ruby built on top of Vega and D3.☆42Jun 22, 2025Updated 8 months ago
- A processor for command-line arguments, an alternative to Getopt, written in D☆17Nov 10, 2017Updated 8 years ago
- AllenNLP model for the Kaggle toxic comments challenge☆32Jul 13, 2018Updated 7 years ago
- Fast stand-alone C++ decoder for RNN-based NMT models☆31Dec 12, 2020Updated 5 years ago
- The extensible test runner for DLang moved to☆16Mar 4, 2026Updated 2 weeks ago
- Emacs minor mode that automatically demangles C++, D, and Rust symbols☆23Aug 22, 2021Updated 4 years ago
- Inverted file indexing and retrieval optimized for short texts. Supports auto-suggest and query segment classification.☆34Jun 12, 2023Updated 2 years ago
- Modified version of fairseq, including new implementations for criterions using reinforcement learning methods.☆11Aug 14, 2019Updated 6 years ago
- A package with different implementations of weighted random sampling without replacement in R☆19Updated this week
- A script for rapidly sampling a proportion of lines from a file☆19Feb 5, 2026Updated last month
- ☆12Nov 25, 2018Updated 7 years ago
- Cosmin Bonchis's enhancements to the Ruby "Vector" and "Matrix" module and includes: LU and QR (Householder, Givens, Gram Schmidt, Hessen…☆33May 8, 2015Updated 10 years ago
- D-language high-level wrapper for GNU MP (GMP) library☆17Mar 12, 2026Updated last week
- Produce a sample of lines from files.☆19Jul 2, 2022Updated 3 years ago
- Ipython/Jupyter magic for inline D code☆20Apr 26, 2023Updated 2 years ago
- Tokyo Metropolitan University Paraphrase Corpus (TMUP)☆11Jun 12, 2017Updated 8 years ago
- dlang-bot for automated bugzilla, github, and trello references☆23Dec 2, 2024Updated last year
- Reimplementation of facebook's DinoV2 in JAX. Inference (with pretrained weights) only; training is unsupported.☆12Jun 25, 2024Updated last year
- Command line programs to save Google documents to text and LaTeX files☆19Oct 2, 2020Updated 5 years ago
- OxLM: Oxford Neural Language Modelling Toolkit☆39Nov 6, 2015Updated 10 years ago
- This application shuffles the input file lines skipping (optionaly) the header. It's optimized for files bigger than available RAM.☆25Jan 9, 2017Updated 9 years ago
- dlang pretty printers for GDB & LLDB for various standard types☆22Dec 24, 2025Updated 2 months ago