weslleylc / Feature-Store
A containerized approach using Apache Kafka, Spark, Cassandra, Hive, Jupyter, and Docker-compose.
☆14Updated 3 years ago
Alternatives and similar repositories for Feature-Store:
Users that are interested in Feature-Store are comparing it to the libraries listed below
- Code snippets for Data Engineering Design Patterns book☆68Updated last week
- A workshop with several modules to help learn Feast, an open-source feature store☆86Updated last month
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆39Updated 3 years ago
- Orchestrate Spark Jobs from Kubeflow Pipelines and poll for the status.☆52Updated 2 years ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆72Updated 3 years ago
- ☆28Updated last year
- The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.☆52Updated 2 years ago
- PySpark phonetic and string matching algorithms☆39Updated 11 months ago
- Great Expectations Airflow operator☆159Updated this week
- A Table format agnostic data sharing framework☆38Updated last year
- Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops☆118Updated last year
- Repo that relates to the Medium blog 'Keeping your ML model in shape with Kafka, Airflow' and MLFlow'☆119Updated last year
- (project & tutorial) dag pipeline tests + ci/cd setup☆86Updated 4 years ago
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆49Updated last year
- Make simple storing test results and visualisation of these in a BI dashboard☆40Updated this week
- REST API for Apache Spark on K8S or YARN☆95Updated this week
- Delta-Lake, ETL, Spark, Airflow☆46Updated 2 years ago
- A Python package that creates fine-grained dbt tasks on Apache Airflow☆64Updated 4 months ago
- Feast AWS guide using Redshift / Spectrum / DynamoDB to build a credit scoring model☆61Updated 3 years ago
- Ray provider for Apache Airflow☆47Updated last year
- Presto Trino with Apache Hive Postgres metastore☆39Updated 5 months ago
- fast and scalable Airflow on Kubernetes Setup.☆28Updated last year
- PyConDE & PyData Berlin 2019 Airflow Workshop: Airflow for machine learning pipelines.☆46Updated last year
- Read Delta tables without any Spark☆47Updated 11 months ago
- A Python Library to support running data quality rules while the spark job is running⚡☆171Updated 3 weeks ago
- Code examples showing flow deployment to various types of infrastructure☆104Updated 2 years ago
- The Python fake data producer for Apache Kafka® is a complete demo app allowing you to quickly produce JSON fake streaming datasets and …☆82Updated 9 months ago
- Code review for data in dbt☆485Updated last month
- A repository of sample code to show data quality checking best practices using Airflow.☆74Updated last year