colbyford / sparkitectureLinks

A collection of “cookbook-style” scripts for simplifying data engineering and machine learning in Apache Spark.

☆13

Alternatives and similar repositories for sparkitecture

Users that are interested in sparkitecture are comparing it to the libraries listed below

Sorting:

MrPowers / ceja
PySpark phonetic and string matching algorithms
☆39Updated last year
mahmoudparsian / pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
☆87Updated 5 years ago
MrPowers / spark-stringmetric
Spark functions to run popular phonetic and string matching algorithms
☆60Updated 3 years ago
mikulskibartosz / check-engine
Data validation library for PySpark 3.0.0
☆33Updated 2 years ago
tspannhw / nifi-nlp-processor
Apache NiFi NLP Processor
☆18Updated last year
jeppe742 / DeltaLakeReader
Read Delta tables without any Spark
☆47Updated last year
DataSystemsGroupUT / SPARKSQLRDFBenchmarking
A systematic Benchmarking on the performance of Spark-SQL for processing Vast RDF datasets
☆14Updated 3 years ago
datamindedbe / lighthouse
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…
☆62Updated last year
autodeployai / pypmml-spark
Python PMML scoring library for PySpark as SparkML Transformer
☆22Updated 10 months ago
aamend / spark-gdelt
Binding the GDELT universe in a Spark environment
☆25Updated 2 years ago
Gaglia88 / sparker
SparkER: an Entity Resolution framework for Apache Spark
☆65Updated last year
nrpowell / grakn-movie-recommender
☆19Updated 8 years ago
spoddutur / graph-knowledge-browser
Real-time query spark and visualise it as graph.
☆24Updated 8 years ago
agile-lab-dev / DataQuality
DataQuality for BigData
☆144Updated last year
israelio / Apache-nifi-links
☆25Updated 6 years ago
PacktPublishing / Mastering-Spark-for-Data-Science
Mastering Spark for Data Science, published by Packt
☆49Updated 2 years ago
logicalclocks / hops-examples
Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
☆118Updated 2 years ago
TrivadisPF / platys-modern-data-platform
Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....
☆77Updated this week
yaooqinn / itachi
A library that brings useful functions from various modern database management systems to Apache Spark
☆60Updated 2 years ago
FINRAOS / MegaSparkDiff
A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…
☆52Updated 3 months ago
dimajix / flowman
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…
☆96Updated 2 weeks ago
kaiserpreusse / graphio
Bulk loading of large data sets into Neo4j
☆23Updated 2 weeks ago
MrPowers / gill
An example PySpark project with pytest
☆17Updated 8 years ago
zero323 / pyspark-asyncactions
Asynchronous actions for PySpark
☆47Updated 3 years ago
hhbyyh / DataFrameCheatSheet
Cheatsheet for Spark DataFrame
☆91Updated 5 years ago
neo4j-graph-analytics / ml-link-prediction-notebooks
Notebooks for the ML Link Prediction Course
☆14Updated 4 years ago
sparklingpandas / sparklingml
Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)
☆74Updated last year
justhackit / spark-utils
☆10Updated 3 years ago
SETL-Framework / setl
A simple Spark-powered ETL framework that just works 🍺
☆182Updated 2 weeks ago
kbastani / spark-neo4j
A single docker image that combines Neo4j Mazerunner and Apache Spark GraphX into a powerful all-in-one graph processing engine
☆45Updated 6 years ago