Randomly sample lines from massive text files efficiently
☆17Apr 1, 2015Updated 11 years ago
Alternatives and similar repositories for subsample
Users that are interested in subsample are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆15Jan 24, 2017Updated 9 years ago
- A processor for KyotoCorpus, KWDLC, and AnnotatedFKCCorpus☆10Jun 26, 2024Updated last year
- Reinforcement Learning Replications is a set of Pytorch implementations of reinforcement learning algorithms.☆24Apr 4, 2026Updated 2 months ago
- Symmetrized word alignment models, based on mgizapp and GIZA++☆14Jun 23, 2014Updated 11 years ago
- Code and data for: Low Resource Grammatical Error Correction Using Wikipedia Edits (WNUT 2018)☆17Jul 16, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A command line tool to display TSV data in tree-like format☆13Jun 19, 2021Updated 4 years ago
- Experimental collections library☆14Mar 27, 2019Updated 7 years ago
- Roaring Bitmaps for D☆11Dec 15, 2018Updated 7 years ago
- Implementation of Content Defined Chunking (CDC) in D☆11May 10, 2023Updated 3 years ago
- D implementation of xxhash☆17Oct 1, 2025Updated 8 months ago
- Jupyter wire protocol implementation enabling D plugins to be jupyter kernels☆13Apr 9, 2021Updated 5 years ago
- Simple associative array implementation for D (-betterC) that fits my needs.☆14Mar 22, 2022Updated 4 years ago
- ☆16Oct 24, 2017Updated 8 years ago
- Statistical mixed effects models in Ruby☆21Jul 8, 2016Updated 9 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Word sense disambiguation test sets for NMT☆21Dec 3, 2020Updated 5 years ago
- Library and executable for working with playlist files.☆13Dec 15, 2025Updated 5 months ago
- Larger-Context NMT☆13Aug 20, 2017Updated 8 years ago
- ☆18Oct 5, 2017Updated 8 years ago
- Data Analytics Library for Python☆17Apr 28, 2026Updated last month
- A processor for command-line arguments, an alternative to Getopt, written in D☆16Nov 10, 2017Updated 8 years ago
- Graph Theory library for Ruby☆48Oct 23, 2019Updated 6 years ago
- Ruby library and tools for working with datapackages☆11Aug 20, 2021Updated 4 years ago
- This repo contains the code needed to run the R package Autotuner. Autotuner is used to identify proper parameters during metabolomics da…☆16Jan 21, 2021Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Ruby interface to the GNU Scientific Library [Ruby 2.x and GSL 1.16 compatible fork of the gsl gem]☆27Jun 24, 2015Updated 10 years ago
- Fast stand-alone C++ decoder for RNN-based NMT models☆31Dec 12, 2020Updated 5 years ago
- Emacs minor mode that automatically demangles C++, D, and Rust symbols☆23Aug 22, 2021Updated 4 years ago
- Inverted file indexing and retrieval optimized for short texts. Supports auto-suggest and query segment classification.☆34Jun 12, 2023Updated 2 years ago
- A package with different implementations of weighted random sampling without replacement in R☆20May 25, 2026Updated 2 weeks ago
- A script for rapidly sampling a proportion of lines from a file☆19Apr 28, 2026Updated last month
- Cosmin Bonchis's enhancements to the Ruby "Vector" and "Matrix" module and includes: LU and QR (Householder, Givens, Gram Schmidt, Hessen…☆33May 8, 2015Updated 11 years ago
- Finite state compiler, processor and helper tools used by apertium☆21May 7, 2026Updated last month
- Extended keyboard layers for easy navigation and functionality based on Colemak☆13Updated this week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Produce a sample of lines from files.☆20Jul 2, 2022Updated 3 years ago
- An utility to randomize and split really huge (100+ GB) text files☆21Dec 18, 2016Updated 9 years ago
- Semi-Markov Afterstate Actor-Critic (SMAAC) with Maze☆11Nov 16, 2021Updated 4 years ago
- Tokyo Metropolitan University Paraphrase Corpus (TMUP)☆11Jun 12, 2017Updated 8 years ago
- dlang-bot for automated bugzilla, github, and trello references☆23Dec 2, 2024Updated last year
- D port of meta tic-tac-toe game written for the GNU assembler☆24Nov 22, 2018Updated 7 years ago
- Reimplementation of facebook's DinoV2 in JAX. Inference (with pretrained weights) only; training is unsupported.☆12Jun 25, 2024Updated last year