victorskl / genomic-bigdata-spark
Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture
☆11Updated 2 years ago
Alternatives and similar repositories for genomic-bigdata-spark
Users that are interested in genomic-bigdata-spark are comparing it to the libraries listed below
Sorting:
- This is a repo for migration of CROssBAR data to the Neo4j database via BioCypher☆9Updated last month
- VCF Observer is a VCF file analysis, comparison, and visualization tool.☆17Updated 4 months ago
- LLM-based gene function enrichment tool☆11Updated 3 months ago
- Template for creating a BioCypher-driven knowledge graph☆12Updated 5 months ago
- Very large scale k-mer counting and analysis on Apache Spark.☆18Updated last year
- An option to spin cost effective EMR clusters in AWS with Hail and JupyterNotebook installed☆16Updated 4 years ago
- ☆18Updated 11 months ago
- GECO (Gene Expression Clustering Optimization; theGECOapp.com) is a minimalistic GUI app that utilizes non-linear reduction techniques to…☆9Updated last year
- The advanced implementation for BioChatter, using Next.js☆13Updated 4 months ago
- Feature Annotation Location Description Ontology☆34Updated 5 years ago
- Standard for describing and searching biomedical data developed by the Global Alliance for Genomics & Health.☆24Updated last year
- SPROUT is a machine learning tool to predict the DNA repair outcome in CRISPR experiments.☆16Updated 3 years ago
- Tool for finding matches to degenerate sequence motifs in FASTA files.☆13Updated last year
- DuckDB Extension for working with bioinformatic data.☆16Updated last year
- Namespace encoding hierarchical relationships between proteins, protein families, and protein complexes.☆12Updated 4 years ago
- Get a nicely-chunked local copy of the biomedical literature (to use for other projects)!☆14Updated 11 months ago
- jinja2-enabled jupyter notebooks☆37Updated 3 weeks ago
- A bioinformatics API to interface with public multi-omics bio databases for wicked fast data integration.☆32Updated 10 months ago
- Pipeline for the identification of (coding) gene structures in draft genomes.☆28Updated last year
- Semantic Search☆33Updated this week
- For MHC-I protein-peptide binding predictions: Deep Learning model with CNN and Snakemake workflow☆12Updated 6 years ago
- NEAT (NExt-generation Analysis Toolkit) simulates next-gen sequencing reads and can learn simulation parameters from real data.☆53Updated last month
- A Python3 script for removal of outlier sequences from a multiple sequence alignment (FASTA format).☆9Updated 3 months ago
- ARCHIVED: this has been folded into the owlapi☆11Updated 4 years ago
- Scanomatic☆10Updated last year
- Python library & CLI to create, view and edit PFB files☆11Updated 3 weeks ago
- Deep learning library for biological sequences. Extension of Fastai and Pytorch.☆40Updated last month
- A parallel API crawler for the retrieval of Kyoto Encyclopedia of Genes and Genomes metabolic and genomics data.☆20Updated last year
- Implementation of LSTM for detecting regions of Neanderthal introgression in modern human genomes☆9Updated 5 years ago
- Viral Identification and Discovery - A viral characterization pipeline built in Nextflow.☆11Updated 5 years ago