shauryashaurya / learn-data-mungingLinks
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
☆48Updated last month
Alternatives and similar repositories for learn-data-munging
Users that are interested in learn-data-munging are comparing it to the libraries listed below
Sorting:
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆12Updated last year
- Cost Efficient Data Pipelines with DuckDB☆55Updated 2 months ago
- Code and materials for Effective Polars book☆83Updated last year
- Code for my "Efficient Data Processing in SQL" book.☆57Updated 11 months ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated last year
- Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.☆131Updated last year
- A FastMCP tool to search and retrieve Polars API documentation.☆64Updated last month
- Public notebooks and datasets to accompany the Data Analysis with Polars course on Udemy☆42Updated last year
- Templates for your Kedro projects.☆77Updated this week
- A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics …☆20Updated 3 years ago
- Example repo to kickstart integration with mlflow recipes.☆44Updated 5 months ago
- An example MLFlow project☆48Updated 6 months ago
- Repo for CDC with debezium blog post☆28Updated 10 months ago
- Deploy A/B testing infrastructure in a containerized microservice architecture for Machine Learning applications.☆40Updated 6 months ago
- IbisML is a library for building scalable ML pipelines using Ibis.☆111Updated this week
- Sample projects using Ploomber.☆86Updated last year
- ☆29Updated last year
- It's all in the name☆78Updated 2 years ago
- Demo on how to use Prefect with Docker☆26Updated 2 years ago
- ☆16Updated last year
- Explore and compare 1K+ accurate decision trees in your browser!☆165Updated last year
- ☆8Updated last year
- Demo on how to use Prefect 2 in an ML project☆41Updated 2 years ago
- Possibly the fastest DataFrame-agnostic quality check library in town.☆195Updated this week
- A simple and easy to use Data Quality (DQ) tool built with Python.☆50Updated last year
- Code samples for the Effective Data Science Infrastructure book☆115Updated 2 years ago
- Adding timestamps to NumFOCUS and PyData YouTube videos!☆95Updated 3 years ago
- Example repo to kickstart integration with mlflow pipelines.☆77Updated 2 years ago
- Record matching and entity resolution at scale in Spark☆34Updated last year
- Plugins, extensions, case studies, articles, and video tutorials for Kedro☆80Updated 7 months ago