Randomly sample lines from massive text files efficiently
☆16Apr 1, 2015Updated 11 years ago
Alternatives and similar repositories for subsample
Users that are interested in subsample are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆15Jan 24, 2017Updated 9 years ago
- Project OCELoT: an Open, Collaborative Evaluation Leaderboard of Translations☆23Updated this week
- Repository collecting resources and best practices to improve experimental rigour in deep learning research.☆27Mar 30, 2023Updated 3 years ago
- Extensions to torch distributions☆19Apr 22, 2022Updated 4 years ago
- Cross Sentence Neural Machine Translation☆10Mar 26, 2018Updated 8 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Rust python bindings for symspell☆21Dec 25, 2023Updated 2 years ago
- Code and data for: Low Resource Grammatical Error Correction Using Wikipedia Edits (WNUT 2018)☆17Jul 16, 2024Updated last year
- A command line tool to display TSV data in tree-like format☆13Jun 19, 2021Updated 5 years ago
- Experimental collections library☆14Mar 27, 2019Updated 7 years ago
- Implementation of Content Defined Chunking (CDC) in D☆11May 10, 2023Updated 3 years ago
- D implementation of xxhash☆17Oct 1, 2025Updated 8 months ago
- Simple associative array implementation for D (-betterC) that fits my needs.☆14Mar 22, 2022Updated 4 years ago
- A comparison of Nim's performance against the "Faster Command Line Tools in D" blog post found here: http://dlang.org/blog/2017/05/24/fas…☆14Mar 31, 2018Updated 8 years ago
- Word sense disambiguation test sets for NMT☆21Dec 3, 2020Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- D header for librdkafka☆10Jun 10, 2019Updated 7 years ago
- Library and executable for working with playlist files.☆13Dec 15, 2025Updated 6 months ago
- Larger-Context NMT☆13Aug 20, 2017Updated 8 years ago
- Script to get ACL Anthology☆16Jan 2, 2025Updated last year
- A processor for command-line arguments, an alternative to Getopt, written in D☆16Nov 10, 2017Updated 8 years ago
- Whitebox AES implementation in C++☆29Feb 16, 2020Updated 6 years ago
- This repo contains the code needed to run the R package Autotuner. Autotuner is used to identify proper parameters during metabolomics da…☆16Jan 21, 2021Updated 5 years ago
- [!Deprecated!] Docker image of Cloud9 WebIDE,请移步 https://github.com/xczh/dockerfiles/tree/master/code-server☆23Oct 6, 2019Updated 6 years ago
- The extensible test runner for DLang moved to☆16May 30, 2026Updated last month
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Inverted file indexing and retrieval optimized for short texts. Supports auto-suggest and query segment classification.☆34Jun 12, 2023Updated 3 years ago
- Modified version of fairseq, including new implementations for criterions using reinforcement learning methods.☆11Aug 14, 2019Updated 6 years ago
- A package with different implementations of weighted random sampling without replacement in R☆20May 25, 2026Updated last month
- A script for rapidly sampling a proportion of lines from a file☆19Apr 28, 2026Updated 2 months ago
- Cosmin Bonchis's enhancements to the Ruby "Vector" and "Matrix" module and includes: LU and QR (Householder, Givens, Gram Schmidt, Hessen…☆33May 8, 2015Updated 11 years ago
- CUDA FFT convolution☆16Mar 18, 2015Updated 11 years ago
- An utility to randomize and split really huge (100+ GB) text files☆21Dec 18, 2016Updated 9 years ago
- Contrastive evaluation of pronoun translation in neural machine translation☆26Aug 22, 2019Updated 6 years ago
- SDK for TEASPN, a framework and a protocol for integrated writing assistance environments☆59Dec 9, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Non-Adversarial Unsupervised Word Translation☆27Apr 2, 2020Updated 6 years ago
- Tokyo Metropolitan University Paraphrase Corpus (TMUP)☆11Jun 12, 2017Updated 9 years ago
- dlang-bot for automated bugzilla, github, and trello references☆23Dec 2, 2024Updated last year
- Various tools to process the D programming language☆18Jun 3, 2021Updated 5 years ago
- D port of meta tic-tac-toe game written for the GNU assembler☆24Nov 22, 2018Updated 7 years ago
- This application shuffles the input file lines skipping (optionaly) the header. It's optimized for files bigger than available RAM.☆25Jan 9, 2017Updated 9 years ago
- Library for experimenting with state-of-the-art evaluation metrics like UScore☆12May 27, 2023Updated 3 years ago