A command-line tool to mitigate homology-based data leakage in sequence-to-expression models
☆19Oct 27, 2025Updated 4 months ago
Alternatives and similar repositories for hashFrag
Users that are interested in hashFrag are comparing it to the libraries listed below
Sorting:
- Genomic sequence preprocessing toolkit☆13Jan 13, 2026Updated last month
- Annotated sequence data☆11Feb 2, 2025Updated last year
- Toolset for training quantitative sequence to function models.☆23Mar 15, 2024Updated last year
- Just another minhash implementation.☆12Updated this week
- Modular cloning simulation with the MoClo framework in Python☆12May 3, 2022Updated 3 years ago
- Mistle is a fast spectral search engine. It uses a fragment-indexing technique and SIMD intrinsics to match experimental MS2 spectra to l…☆16Oct 6, 2023Updated 2 years ago
- Dataloader for applying sequence models to personalized genomics☆28Updated this week
- ☆34Jan 27, 2025Updated last year
- A method for analyzing scATAC-seq experiments.☆34Jun 20, 2025Updated 8 months ago
- Toolkit for training hyenaDNA-based autoregressive language models on DNA sequences.☆50Oct 4, 2024Updated last year
- FA2021 Bootcamp course website for the incoming cohort of the Bioinformatics & Systems Biology Ph.D. program☆19Sep 16, 2021Updated 4 years ago
- Ledidi turns any machine learning model into a biological sequence editor, allowing you to design sequences with desired properties.☆101Jan 31, 2026Updated last month
- Eukaryotic genome annotation software.☆22Feb 13, 2026Updated 2 weeks ago
- A Rust library for parsing, writing and manipulating Genbank sequence files☆22Apr 20, 2025Updated 10 months ago
- Python package to plot a phylogenetic tree on an existing matplotlib axis.☆27Jul 21, 2025Updated 7 months ago
- Parallel Construction of Suffix Arrays in Rust☆26May 2, 2025Updated 9 months ago
- Decima is a Python library to train sequence models on single-cell RNA-seq data.☆63Feb 5, 2026Updated 3 weeks ago
- Code repository for study ''Evaluating the representational power of pre-trained DNA language models for regulatory genomics"☆24Jun 26, 2024Updated last year
- From genomes to phenotypes: Traitar3, the microbial trait analyzer (for Python3)☆21Jul 8, 2024Updated last year
- MAVE-NN: genotype-phenotype maps from multiplex assays of variant effect☆28Dec 11, 2025Updated 2 months ago
- Biological sequence analysis for the modern age.☆263Updated this week
- a simple read-only sequence database, designed for short reads☆20Dec 19, 2016Updated 9 years ago
- Protein Sequence Annotation with Language Models☆28Jan 24, 2026Updated last month
- Evolution-inspired data augmentations for PyTorch-based models for regulatory genomics☆25Jun 3, 2025Updated 8 months ago
- Pyranges: a Python framework for ultrafast sequence interval operations☆43Feb 15, 2026Updated last week
- A fast dataloader for bigwig files made for machine learning☆29Dec 16, 2025Updated 2 months ago
- Efficient and fast querying and parsing of GTDB's data☆29Aug 21, 2025Updated 6 months ago
- Quick pre-QC knee plots for barcode based scRNAseq data☆12Dec 8, 2018Updated 7 years ago
- Pytorch implementation of the Borzoi model from Calico, and Flashzoi, a 3x faster Borzoi enhancement.☆97Nov 13, 2025Updated 3 months ago
- Python wrapper for wavefront alignment using WFA2-lib☆38Nov 19, 2024Updated last year
- A Python package for mapping sequence aligned data onto protein structures☆37May 26, 2021Updated 4 years ago
- 📜 the Great Automatic Nomenclator — The Next Million Names for Archaea and Bacteria☆41Oct 28, 2025Updated 4 months ago
- A CUDA Library for Parallel n-body Integrations with focus on Simulations☆17Jul 2, 2014Updated 11 years ago
- A lite implementation of tfmodisco, a motif discovery algorithm for genomics experiments.☆87Sep 24, 2025Updated 5 months ago
- Contains the description of a file format to store kmers and associated values☆34Aug 17, 2022Updated 3 years ago
- Multiple Bacteria Genome Compressor (MBGC)☆11Feb 20, 2026Updated last week
- Compute strain abundance in a defined microbial community☆10Jul 27, 2023Updated 2 years ago
- PSI-MOD ontology for modified and unmodified amino acid residues☆14Jan 8, 2026Updated last month
- A tool for investigating alternative mRNA splicing in next generation mRNA sequence data.☆11Mar 3, 2017Updated 8 years ago