Randomly sample lines from massive text files efficiently
☆17Apr 1, 2015Updated 11 years ago
Alternatives and similar repositories for subsample
Users that are interested in subsample are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆14Jan 24, 2017Updated 9 years ago
- BERT models with tokenization for Japanese texts.☆14Nov 15, 2019Updated 6 years ago
- Rust python bindings for symspell☆21Dec 25, 2023Updated 2 years ago
- Code and data for: Low Resource Grammatical Error Correction Using Wikipedia Edits (WNUT 2018)☆17Jul 16, 2024Updated last year
- A command line tool to display TSV data in tree-like format☆13Jun 19, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Jupyter wire protocol implementation enabling D plugins to be jupyter kernels☆13Apr 9, 2021Updated 5 years ago
- A small library and cli to bypass code.dlang.org in a way transparent to dub☆13Oct 28, 2024Updated last year
- Word sense disambiguation test sets for NMT☆20Dec 3, 2020Updated 5 years ago
- ☆18Oct 5, 2017Updated 8 years ago
- Script to get ACL Anthology☆16Jan 2, 2025Updated last year
- Data Analytics Library for Python☆17Mar 24, 2026Updated last month
- int8_t and int16_t matrix multiply based on https://arxiv.org/abs/1705.01991☆75Dec 30, 2023Updated 2 years ago
- A plotting library in Ruby built on top of Vega and D3.☆42Jun 22, 2025Updated 10 months ago
- Graph Theory library for Ruby☆48Oct 23, 2019Updated 6 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- This repo contains the code needed to run the R package Autotuner. Autotuner is used to identify proper parameters during metabolomics da…☆16Jan 21, 2021Updated 5 years ago
- Inverted file indexing and retrieval optimized for short texts. Supports auto-suggest and query segment classification.☆34Jun 12, 2023Updated 2 years ago
- Modified version of fairseq, including new implementations for criterions using reinforcement learning methods.☆11Aug 14, 2019Updated 6 years ago
- A package with different implementations of weighted random sampling without replacement in R☆20Mar 17, 2026Updated last month
- A script for rapidly sampling a proportion of lines from a file☆19Feb 5, 2026Updated 2 months ago
- ☆12Nov 25, 2018Updated 7 years ago
- An utility to randomize and split really huge (100+ GB) text files☆21Dec 18, 2016Updated 9 years ago
- Ipython/Jupyter magic for inline D code☆20Apr 26, 2023Updated 3 years ago
- Contrastive evaluation of pronoun translation in neural machine translation☆26Aug 22, 2019Updated 6 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- SDK for TEASPN, a framework and a protocol for integrated writing assistance environments☆59Dec 9, 2022Updated 3 years ago
- Non-Adversarial Unsupervised Word Translation☆27Apr 2, 2020Updated 6 years ago
- dlang-bot for automated bugzilla, github, and trello references☆23Dec 2, 2024Updated last year
- ☆20Jun 3, 2019Updated 6 years ago
- Various tools to process the D programming language☆18Jun 3, 2021Updated 4 years ago
- Reimplementation of facebook's DinoV2 in JAX. Inference (with pretrained weights) only; training is unsupported.☆12Jun 25, 2024Updated last year
- Command line programs to save Google documents to text and LaTeX files☆19Oct 2, 2020Updated 5 years ago
- This application shuffles the input file lines skipping (optionaly) the header. It's optimized for files bigger than available RAM.☆25Jan 9, 2017Updated 9 years ago
- Library for experimenting with state-of-the-art evaluation metrics like UScore☆12May 27, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Code for "Unsupervised Cross-lingual Transfer of Word Embedding Spaces" in EMNLP 2018☆24Dec 29, 2018Updated 7 years ago
- CLI to convert Scrapbox page to Markdown☆12Dec 4, 2025Updated 4 months ago
- AMI Meeting Parallel Corpus☆11Dec 11, 2020Updated 5 years ago
- A Ruby FFI for common functions in libgphoto2☆35Sep 3, 2022Updated 3 years ago
- The repository has been moved to☆11Sep 22, 2015Updated 10 years ago
- A flask web application for visualising VimWiki. VimWikiGraph creates an undirected graphs of links between VimWiki files that affords fi…☆15Apr 28, 2025Updated last year
- Supporting example for "A Rust SentencePiece implementation"☆20Jun 7, 2020Updated 5 years ago