geyungjen / jentekllc
Apache Spark Application Development -- George Jen, Jen Tek LLC
☆16Updated last year
Alternatives and similar repositories for jentekllc:
Users that are interested in jentekllc are comparing it to the libraries listed below
- Spark NLP for Streamlit☆15Updated 3 years ago
- PySpark phonetic and string matching algorithms☆39Updated last year
- Interactive notebooks containing demonstration code of the splink library☆38Updated last year
- Blog post on ETL pipelines with Airflow☆23Updated 4 years ago
- This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apach…☆19Updated 8 years ago
- How to do data science with Optimus, Spark and Python.☆19Updated 5 years ago
- IBM Data Science Experience Desktop was built for those who want to download and play locally. Analyze, learn, and build with the tools y…☆33Updated 5 years ago
- Model management example using Polyaxon, Argo and Seldon☆23Updated 6 years ago
- Mastering Spark for Data Science, published by Packt☆47Updated 2 years ago
- ☆16Updated last year
- A simple introduction to using spark ml pipelines☆26Updated 6 years ago
- Fully unit tested utility functions for data engineering. Python 3 only.☆15Updated 7 months ago
- Simple validator for submissions to DrivenData competitions☆19Updated 5 years ago
- Source code for the MC technical blog post "Data Observability in Practice Using SQL"☆36Updated 8 months ago
- Big Data Demystified meetup and blog examples☆31Updated 7 months ago
- Analysis of City Of Chicago Taxi Trip Dataset Using AWS EMR, Spark, PySpark, Zeppelin and Airbnb's Superset☆15Updated 7 years ago
- Styles for dbt on the net☆9Updated 3 months ago
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- H2OAI Driverless AI Code Samples and Tutorials☆37Updated 5 months ago
- Productivity Utilities for Data Science with Python Notebooks☆6Updated 5 years ago
- ☆15Updated 5 years ago
- ☆21Updated last year
- ☆19Updated 4 years ago
- ☆16Updated 4 years ago
- Materials for dask talk at PyData NYC☆15Updated 9 years ago
- Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on sing…☆45Updated 5 months ago
- ☆16Updated 7 years ago
- ☆26Updated last year
- Distributed Bayesian Entity Resolution in Apache Spark☆57Updated 3 years ago
- ☆12Updated 4 years ago