icanbwell / SparkPipelineFramework
Framework for simpler Spark Pipelines
β10Updated this week
Related projects β
Alternatives and complementary repositories for SparkPipelineFramework
- Provide an easy way with Python to protect your data sources by searching its metadata. π‘οΈβ17Updated 2 weeks ago
- MacIP is a versatile command-line tool for managing and changing MAC and IP addresses, offering both manual and automated options. It's dβ¦β15Updated 2 weeks ago
- Astronomer Vendor Imagesβ12Updated this week
- Easily assemble and consume modular pipelines of sequenced AI models.β13Updated 3 weeks ago
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.β51Updated 2 weeks ago
- A CLI to manage and monitor permissions in AWS Lake Formationβ25Updated last year
- This Guidance demonstrates how to create an intelligent manufacturing digital thread through a combination of knowledge graph and generatβ¦β17Updated 2 weeks ago
- Sample code supporting the `Generating REST APIs from data classes in Python` blog postβ11Updated 5 months ago
- π Generate markdown documentation based on a JSON Schema documentβ15Updated 2 weeks ago
- Abstractions for feature engineering on large graphs of tabular data.β22Updated this week
- Simple animation for PlantUML diagramsβ14Updated 4 months ago
- Profiles the data, validates the schema and runs data quality checks and produces a reportβ20Updated 5 years ago
- A tool to learn JSON schema from collection of documents and generate Create table statement for Redshiftβ19Updated 3 weeks ago
- In this pattern, data records are ingested and then modified with simple transformations such as field level substitutions and data enricβ¦β12Updated 5 years ago
- Pipeline definitions for managing data flows to power analytics at MIT Open Learningβ37Updated this week
- A project template for developing BYOD docker images for use in Amazon SageMaker.β19Updated 4 years ago
- A collection of python utility functionsβ12Updated 4 months ago
- β20Updated this week
- Using the Parquet file format with Pythonβ14Updated last year
- β12Updated last year
- β32Updated 8 months ago
- A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobsβ38Updated 6 months ago
- End-to-end DataOps platform deployed by Terraform.β63Updated 4 months ago
- Delta reader for the Ray open-source toolkit for building ML applicationsβ42Updated 9 months ago
- Fully unit tested utility functions for data engineering. Python 3 only.β14Updated 2 months ago
- real-time data + ML pipelineβ54Updated 2 weeks ago
- Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms andβ¦β28Updated last year
- Batteries included toolkit for data engineering.β32Updated 2 months ago
- Python requirements compilationβ14Updated last week