dcjones/subsample

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/dcjones/subsample)

dcjones / subsample

Randomly sample lines from massive text files efficiently

☆16

Alternatives and similar repositories for subsample

Users that are interested in subsample are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

singletongue / japanese-bert
View on GitHub
BERT models with tokenization for Japanese texts.
☆14Nov 15, 2019Updated 6 years ago
burrsettles / readability
View on GitHub
Text readability metrics in Python.
☆11Aug 29, 2013Updated 12 years ago
AppraiseDev / OCELoT
View on GitHub
Project OCELoT: an Open, Collaborative Evaluation Leaderboard of Translations
☆23Jul 11, 2026Updated last week
Kaleidophon / awesome-experimental-standards-deep-learning
View on GitHub
Repository collecting resources and best practices to improve experimental rigour in deep learning research.
☆27Mar 30, 2023Updated 3 years ago
UnderGear / Klonamari
View on GitHub
Basically a mini Katamari Damacy Clone in Unity
☆20Apr 11, 2016Updated 10 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zoho-labs / symspell
View on GitHub
Rust python bindings for symspell
☆21Dec 25, 2023Updated 2 years ago
emjotde / symgiza-pp
View on GitHub
Symmetrized word alignment models, based on mgizapp and GIZA++
☆14Jun 23, 2014Updated 12 years ago
shuoyangd / tape4nmt
View on GitHub
a ducttape workflow for neural machine translation
☆14Mar 23, 2021Updated 5 years ago
adrianeboyd / boyd-wnut2018
View on GitHub
Code and data for: Low Resource Grammatical Error Correction Using Wikipedia Edits (WNUT 2018)
☆17Jul 16, 2024Updated 2 years ago
mzimbres / tsvtree
View on GitHub
A command line tool to display TSV data in tree-like format
☆13Jun 19, 2021Updated 5 years ago
dlang-stdx / collections
View on GitHub
Experimental collections library
☆14Mar 27, 2019Updated 7 years ago
tamediadigital / librdkafka-d
View on GitHub
☆11Apr 27, 2017Updated 9 years ago
ankuPRK / Emotion-Recognition-in-Hindi-Speech
View on GitHub
Classifying utterances in Hindi speech in one of the 8 emotional states (anger, fear, disgust, neutral, sad, happy, surprise, sarcastic) …
☆11Apr 28, 2016Updated 10 years ago
aferust / bcaa
View on GitHub
Simple associative array implementation for D (-betterC) that fits my needs.
☆14Mar 22, 2022Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
asafamr / bertwsi
View on GitHub
Word Sense Induction with BERT MLM
☆28Jul 6, 2023Updated 3 years ago
grammarly / GMEG
View on GitHub
GMEG
☆33Nov 21, 2024Updated last year
DlangApache / librdkafka
View on GitHub
D header for librdkafka
☆10Jun 10, 2019Updated 7 years ago
rsennrich / lingeval97
View on GitHub
☆18Oct 5, 2017Updated 8 years ago
tuzhaopeng / LC-NMT
View on GitHub
Larger-Context NMT
☆13Aug 20, 2017Updated 8 years ago
markuslaker / Argon
View on GitHub
A processor for command-line arguments, an alternative to Getopt, written in D
☆16Nov 10, 2017Updated 8 years ago
joelgrus / kaggle-toxic-allennlp
View on GitHub
AllenNLP model for the Kaggle toxic comments challenge
☆30Jul 13, 2018Updated 8 years ago
mikhail-barg / huge-file-processor
View on GitHub
An utility to randomize and split really huge (100+ GB) text files
☆21Dec 18, 2016Updated 9 years ago
marian-nmt / amun
View on GitHub
Fast stand-alone C++ decoder for RNN-based NMT models
☆31Dec 12, 2020Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
microsoft / QBASHER
View on GitHub
Inverted file indexing and retrieval optimized for short texts. Supports auto-suggest and query segment classification.
☆34Jun 12, 2023Updated 3 years ago
krlmlr / wrswoR
View on GitHub
A package with different implementations of weighted random sampling without replacement in R
☆20May 25, 2026Updated last month
hroptatyr / sample
View on GitHub
Produce a sample of lines from files.
☆20Jul 2, 2022Updated 4 years ago
DavidWBressler / adaptivesoftmax
View on GitHub
☆12Nov 25, 2018Updated 7 years ago
DlangScience / PydMagic
View on GitHub
Ipython/Jupyter magic for inline D code
☆20Apr 26, 2023Updated 3 years ago
amazon-science / doc-mt-metrics
View on GitHub
☆29Jul 30, 2024Updated last year
souryadey / morse-dataset
View on GitHub
Generate Morse code datasets for training artificial neural networks
☆26Sep 5, 2020Updated 5 years ago
enlite-ai / maze_smaac
View on GitHub
Semi-Markov Afterstate Actor-Critic (SMAAC) with Maze
☆11Nov 16, 2021Updated 4 years ago
dlang / dlang-bot
View on GitHub
dlang-bot for automated bugzilla, github, and trello references
☆23Dec 2, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
kylestach / dinov2-jax
View on GitHub
Reimplementation of facebook's DinoV2 in JAX. Inference (with pretrained weights) only; training is unsupported.
☆13Jun 25, 2024Updated 2 years ago
pauldb89 / OxLM
View on GitHub
OxLM: Oxford Neural Language Modelling Toolkit
☆39Nov 6, 2015Updated 10 years ago
facebookresearch / Non-adversarialTranslation
View on GitHub
Non-Adversarial Unsupervised Word Translation
☆27Apr 2, 2020Updated 6 years ago
trufanov-nok / shuf-t
View on GitHub
This application shuffles the input file lines skipping (optionaly) the header. It's optimized for files bigger than available RAM.
☆25Jan 9, 2017Updated 9 years ago
taasnim / unsup-word-translation
View on GitHub
☆20Jun 3, 2019Updated 7 years ago
Pure-D / dlang-debug
View on GitHub
dlang pretty printers for GDB & LLDB for various standard types
☆23Dec 24, 2025Updated 6 months ago
bitextor / bitextor
View on GitHub
Bitextor generates translation memories from multilingual websites
☆299Nov 11, 2024Updated last year