big-data-lab-team / paper-big-data-enginesLinks
A paper comparing Dask and Spark
☆10Updated 2 years ago
Alternatives and similar repositories for paper-big-data-engines
Users that are interested in paper-big-data-engines are comparing it to the libraries listed below
Sorting:
- ☆22Updated 9 months ago
- The Baseline Site Selection Tool implements simulation tools for clinical trial enrollment.☆18Updated 2 years ago
- Open Targets Library ETL Pipeline | Apache Beam☆16Updated 4 years ago
- The PEDSnet Data Quality Assessment Toolkit (OMOP CDM)☆24Updated 4 years ago
- NXOntology data: making ontologies accessible as simple JSON files☆14Updated last month
- Vocabulary lookup tool for LLMs (and humans)☆12Updated 2 years ago
- articat: data artifact catalog☆17Updated 4 months ago
- jinja2-enabled jupyter notebooks☆37Updated 2 weeks ago
- Hephaestus - ETL and ML tools for OHDSI - OMOP CDM☆13Updated 2 years ago
- Attempts to create a state of the art language model on clinical and medical text data.☆12Updated 6 years ago
- Distance computations with Dask (akin to scipy.spatial.distance)☆8Updated 7 years ago
- The documentation for the Clustergrammer project☆10Updated 4 years ago
- OHDSI Ananke - A Tool for Mapping Between OHDSI Concept Identifiers to Unified Medical Language System (UMLS) identifiers☆14Updated 4 years ago
- A repository containing an introduction to Panel made to be support videos and talks.☆56Updated 3 years ago
- Clinical NLP workshop for ODSC☆39Updated 5 years ago
- ☆10Updated 4 years ago
- HDR UK OSS Contributions☆23Updated 4 months ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- Easy Interactive Data Profiling for Big Data (and Small Data)☆14Updated 11 years ago
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 3 years ago
- Outcomes Insights' Data Model for Clinical Research☆19Updated 2 months ago
- Platform enabling Rapid Annotation for Clinical Entity Recognition☆50Updated 3 years ago
- Pre-Modelling Analysis of the data, by doing various exploratory data analysis and Statistical Test.☆51Updated last year
- Unified slicing for all Python data structures.☆35Updated 5 months ago
- Primrose modeling framework for simple production models☆32Updated last year
- RAPIDS data science. No setup required.☆21Updated 4 years ago
- OntoBrowser is a web-based application for managing ontologies☆42Updated 2 years ago
- This repo is an approach to TDD in machine learning model operation. it covers project structure, testing essentials using pytest with Gi…☆15Updated 4 years ago
- ETL Tool for converting datasets to OMOP CDM☆33Updated 2 years ago
- TileDB integrations for machine learning data and model i/o (PyTorch, TensorFlow, Scikit-Learn)☆25Updated 4 months ago