colbyford / sparkitectureLinks
A collection of “cookbook-style” scripts for simplifying data engineering and machine learning in Apache Spark.
☆13Updated 3 years ago
Alternatives and similar repositories for sparkitecture
Users that are interested in sparkitecture are comparing it to the libraries listed below
Sorting:
- PySpark phonetic and string matching algorithms☆39Updated last year
- ☆25Updated 6 years ago
- Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....☆76Updated last week
- Binding the GDELT universe in a Spark environment☆25Updated 2 years ago
- ☆19Updated 7 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆61Updated 11 months ago
- A single docker image that combines Neo4j Mazerunner and Apache Spark GraphX into a powerful all-in-one graph processing engine☆46Updated 5 years ago
- Asynchronous actions for PySpark☆47Updated 3 years ago
- Apache NiFi NLP Processor☆18Updated last year
- Dremio Flight connector. Access Dremio using Arrow flight☆40Updated 4 years ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆52Updated 2 months ago
- ☆10Updated 3 years ago
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)☆74Updated last year
- Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).☆120Updated 2 months ago
- Egeria's Guidance on Governance as well as large media files such as presentations and movies☆106Updated 2 years ago
- Read Delta tables without any Spark☆47Updated last year
- Mastering Spark for Data Science, published by Packt☆47Updated 2 years ago
- zenvisage's foundational framework☆70Updated 2 years ago
- A JupyterLab extension providing, SQL formatter, auto-completion, syntax highlighting, Spark SQL and Trino☆89Updated 2 months ago
- DataQuality for BigData☆144Updated last year
- Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm☆103Updated last year
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…☆26Updated 4 years ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 6 years ago
- ☆33Updated 10 years ago
- Real-time query spark and visualise it as graph.☆24Updated 7 years ago
- Repository of sample Databricks notebooks☆265Updated last year
- ☆107Updated 2 years ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 5 years ago
- Reproducing Distributed Systems and Experiments on Cloud☆39Updated last year
- An example PySpark project with pytest☆16Updated 7 years ago