alexpreynolds / sample
Performs memory-efficient reservoir sampling on very large input files delimited by newlines
☆69Updated 5 years ago
Alternatives and similar repositories for sample
Users that are interested in sample are comparing it to the libraries listed below
Sorting:
- utilities for indexing and sequence extraction from FASTA files☆59Updated 3 years ago
- Fast and memory-efficient sequencing error corrector☆93Updated last year
- Squeakr: An Exact and Approximate k -mer Counting System☆85Updated 2 months ago
- Streaming algorithm for computing kmer statistics for massive genomics datasets☆54Updated 5 years ago
- Cosmo is a fast, low-memory DNA assembler using a Succinct (variable order) de Bruijn Graph.☆51Updated last year
- Fast calculations of linkage-disequilibrium in large-scale human cohorts☆43Updated 5 years ago
- Code accompanying the publication for compressed graph annotation☆13Updated 6 years ago
- Streaming relation (overlap, distance, KNN) of (any number of) sorted genomic interval sets. #golang☆47Updated 4 years ago
- Minhash and maxhash library in Python, combining flexibility, expressivity, and performance.☆21Updated 4 months ago
- normalize, left-align, trim, validate and clean VCF files☆20Updated 9 years ago
- Software accompanying "HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution"☆64Updated 4 years ago
- Enhanced Artificial Genome Engine: next generation sequencing reads simulator☆32Updated 4 years ago
- Fast and accurate set similarity estimation via containment min hash☆42Updated 9 months ago
- Implicit Interval Tree with Interpolation Index☆41Updated 2 years ago
- pythonic wrapper for libhts (moved to: https://github.com/quinlan-lab/hts-python)☆49Updated 8 years ago
- Histosketching Using Little Kmers☆56Updated last year
- BWT-based index for graphs☆71Updated 2 months ago
- Load numpy arrays and HDF5 files from VCF (variant call format)☆31Updated 7 years ago
- Efficient handling of FASTQ files from Python☆51Updated 8 months ago
- Fast spliced aligner with low memory requirements☆41Updated 9 years ago
- a wee tool for random access into BGZF files.☆84Updated 7 years ago
- MinHash Alignment Process (MHAP, pronounced MAP): locality-sensitive hashing to detect long-read overlaps and utilities☆96Updated 2 years ago
- Bonsai: Fast, flexible taxonomic analysis and classification☆70Updated last year
- Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads☆17Updated 4 years ago
- succinct labeled graphs with collections and paths☆15Updated 6 years ago
- ☆21Updated 10 years ago
- A genome assembler that reduces the computational time of human genome assembly from 400,000 CPU hours to 2,000 CPU hours, utilizing long…☆65Updated 4 years ago
- HPG Aligner is an ultrafast and highly sensitive Next-Generation Sequencing (NGS) mapper which supoprts both DNA and RNA alignment☆34Updated 7 years ago
- Fast and accurate genomic distances using HyperLogLog☆160Updated 2 years ago
- Ococo: the first online variant and consensus caller. Call genomic consensus directly from an unsorted SAM/BAM stream.☆47Updated 6 years ago