technion-cs-nlp / BiologicalTokenizers
Effect of tokenization on transformers for biological sequence
☆12Updated 5 months ago
Related projects: ⓘ
- Namespace encoding hierarchical relationships between proteins, protein families, and protein complexes.☆12Updated 3 years ago
- Tokenizers and Machine Learning Models for biological sequence data☆22Updated last week
- MOJITOO: a fast and universal method for integration of multimodal single cell data☆9Updated 11 months ago
- Very large scale k-mer counting and analysis on Apache Spark.☆17Updated 7 months ago
- Fast, sensitive and accurate protein remote homology search on GPUs☆15Updated 4 months ago
- Code for the MSB publication: Exploring amino acid functions and positional subtypes in a deep mutational landscape☆10Updated 2 years ago
- Application for semi-automated genomic annotation.☆13Updated 2 months ago
- A Shiny-based framework to analyze and visualize interactively genomic data☆10Updated last year
- ☆11Updated last month
- GRAph-based Finding of Individual Motif Occurrences☆27Updated 3 weeks ago
- Deep learning library for biological sequences. Extension of Fastai and Pytorch.☆40Updated last month
- A fast and space-efficient pre-filter for estimating the quantification of very large collections of nucleotide sequences☆13Updated last year
- Fast and memory-efficient clustering + coreset construction, including fast distance kernels for Bregman and f-divergences.☆33Updated last year
- ☆14Updated last year
- A novel biclustering algorithm for analyses of transcriptomic data.☆17Updated 5 years ago
- Tool for finding matches to degenerate sequence motifs in FASTA files.☆12Updated 6 months ago
- ☆24Updated 2 years ago
- Major Histocompatibility Complex (MHC) Binding Affinity Prediction☆9Updated 3 years ago
- Categorical Variational Autoencoders☆22Updated 2 years ago
- Boiler: a software tool for highly efficient, lossy compression of RNA-seq alignments☆13Updated 8 years ago
- Literature mining for T cell relations☆23Updated 2 years ago
- Parsing MHC nomenclature in the wild☆16Updated 5 months ago
- Library for visualising genomic features in Python.☆15Updated 7 years ago
- Proteins as words, genomes as documents.☆20Updated 3 years ago
- Repository for "Nearest neighbor search on embeddings rapidly identifies distant protein relations"☆11Updated last year
- Identification of differentially methylated genes in biomedical data☆13Updated 5 years ago
- ☆14Updated last year
- Snakemake + Singularity + SLURM☆9Updated 2 years ago
- Similarity search in heterogeneous knowledge graphs using meta paths.☆23Updated last year
- Prediction of virus-host association using protein language models and multiple instance learning☆10Updated 3 months ago