shauryashaurya / learn-data-munging
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
☆46Updated last week
Alternatives and similar repositories for learn-data-munging:
Users that are interested in learn-data-munging are comparing it to the libraries listed below
- Code and materials for Effective Polars book☆73Updated 10 months ago
- Cost Efficient Data Pipelines with DuckDB☆49Updated 6 months ago
- Code for my "Efficient Data Processing in SQL" book.☆56Updated 6 months ago
- Essential PySpark for Scalable Data Analytics, published by Packt☆43Updated 2 years ago
- Public notebooks and datasets to accompany the Data Analysis with Polars course on Udemy☆43Updated last year
- Deploy A/B testing infrastructure in a containerized microservice architecture for Machine Learning applications.☆40Updated last month
- ☆27Updated 7 months ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆112Updated 10 months ago
- Intro to Polars Tutorial☆22Updated last year
- Example project with a CNN to train a Pokémon type classifier, adapted for DTC workshop☆34Updated last year
- ☆31Updated last year
- Demo of Streamlit application with Databricks SQL Endpoint☆36Updated 2 years ago
- Dockerized Jupyter notebook to run commands from the ML Python Cookbook☆38Updated last year
- ☆66Updated 2 weeks ago
- Setting up an MLflow Workspace with Docker☆27Updated 5 months ago
- ☆27Updated 2 years ago
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆11Updated 8 months ago
- Some example projects for Data Engineers to build, end-to-end.☆27Updated last year
- Sample project that use Dagster, dbt, DuckDB and Dash to visualize car and motorcycle Spanish market☆56Updated 2 years ago
- Apache Airflow Best Practices, published by Packt☆32Updated 3 months ago
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆46Updated last year
- Demo for CI/CD in a machine learning project☆96Updated last year
- Pandas Training © MetaSnake 2022, CC BY-NC☆18Updated 2 years ago
- Datasets for ML, Analysis, etc☆56Updated 3 months ago
- Machine Learning Ops Project☆29Updated 10 months ago
- Step by step instructions to create a production-ready data pipeline☆37Updated last month
- ☆15Updated last year
- Scaling Machine Learning in Three Week course in a collaboration with O'Reilly following the guidance of Adi Polak's book - Scaling Machi…☆23Updated last year
- build dw with dbt☆36Updated 3 months ago
- This is a capstone project associated with MLOps Zoomcamp. The end goal of the project is to build an end-to-end machine learning projec…☆13Updated 2 years ago