gtoonstra / databookLinks

A facebook for data

☆26

Alternatives and similar repositories for databook

Users that are interested in databook are comparing it to the libraries listed below

Sorting:

Wikia / discreETLy
ETLy is an add-on dashboard service on top of Apache Airflow.
☆68Updated 2 years ago
datamindedbe / lighthouse
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…
☆62Updated last year
jerzygangi / forklift
🚚 ETL for Spark and Airflow
☆25Updated 7 years ago
rambler-digital-solutions / airflow-declarative
Airflow declarative DAGs via YAML
☆133Updated 2 years ago
bahchis / airflow-cookbook
Airflow workflow management platform chef cookbook.
☆71Updated 6 years ago
industrydive / fileflow
Airflow plugin to transfer arbitrary files between operators
☆78Updated 7 years ago
amundsen-io / amundsendatabuilder
Data ingestion library for Amundsen to build graph and search index
☆204Updated last year
jrderuiter / airflow-fs
Composable filesystem hooks and operators for Apache Airflow.
☆17Updated 4 years ago
etsy / boundary-layer
Builds Airflow DAGs from configuration files. Powers all DAGs on the Etsy Data Platform
☆260Updated 2 years ago
amundsen-io / amundsenfrontendlibrary
Front-end service library for Amundsen
☆279Updated last month
ing-bank / rokku
Rokku project. This project acts as a proxy on top of any S3 storage solution providing services like authentication, authorization, shor…
☆70Updated 2 months ago
zalando-incubator / spark-json-schema
JSON schema parser for Apache Spark
☆82Updated 3 years ago
swoop-inc / spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
☆73Updated 4 years ago
FRosner / drunken-data-quality
Spark package for checking data quality
☆222Updated 5 years ago
amundsen-io / amundsenmetadatalibrary
Metadata service library for Amundsen
☆82Updated 4 months ago
mara / mara-example-project-2
An example mini data warehouse for python project stats, template for new projects
☆178Updated 5 years ago
rssanders3 / airflow-spark-operator-plugin
A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator
☆73Updated 6 years ago
CoxAutomotiveDataSolutions / waimak
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
☆76Updated last year
FINRAOS / MegaSparkDiff
A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…
☆52Updated 5 months ago
dsaidgovsg / airflow-pipeline
An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
☆175Updated 5 months ago
yaooqinn / itachi
A library that brings useful functions from various modern database management systems to Apache Spark
☆60Updated 2 years ago
intuit / superglue
Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs …
☆159Updated 2 years ago
airflow-plugins / airflow_api_plugin
REST-like API exposing Airflow data and operations
☆61Updated 6 years ago
funkyminds / cleanframes
type-class based data cleansing library for Apache Spark SQL
☆78Updated 6 years ago
airbnb / sputnik
☆63Updated 6 years ago
mayur2810 / sope
Apache Spark ETL Utilities
☆39Updated last year
mikulskibartosz / check-engine
Data validation library for PySpark 3.0.0
☆33Updated 3 years ago
sparsecode / DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…
☆26Updated 4 years ago
snowplow / snowplow-scala-analytics-sdk
Scala SDK for working with Snowplow enriched events in Spark, AWS Lambda, Flink et al.
☆21Updated last year
dbt-labs / dbt-presto
[ARCHIVED] The Presto adapter plugin for dbt Core
☆33Updated last year