shauryashaurya / learn-data-mungingLinks
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
☆50Updated 4 months ago
Alternatives and similar repositories for learn-data-munging
Users that are interested in learn-data-munging are comparing it to the libraries listed below
Sorting:
- Code and materials for Effective Polars book☆82Updated last year
- Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.☆133Updated last year
- Intro to Polars Tutorial☆22Updated 2 years ago
- A FastMCP tool to search and retrieve Polars API documentation.☆71Updated 6 months ago
- Cost Efficient Data Pipelines with DuckDB☆60Updated 6 months ago
- Public notebooks and datasets to accompany the Data Analysis with Polars course on Udemy☆45Updated 2 years ago
- Example repo to kickstart integration with mlflow recipes.☆45Updated 4 months ago
- It's all in the name☆81Updated 2 years ago
- Construct a modern data stack and orchestration the workflows to create high quality data for analytics and ML applications.☆230Updated 3 years ago
- ☆28Updated 3 years ago
- csv and flat-file sniffer built in Rust.☆43Updated last year
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆114Updated 3 weeks ago
- Sample projects using Ploomber.☆86Updated last year
- Demo on how to use Prefect with Docker☆27Updated 3 years ago
- Duke MIDS: Data Engineering and DataOps Course☆67Updated 10 months ago
- A repository containing data and files for my stories on Medium.com.☆59Updated 9 months ago
- Code samples for the Effective Data Science Infrastructure book☆115Updated 2 years ago
- summarytools in jupyter notebook☆111Updated last year
- Code for my "Efficient Data Processing in SQL" book.☆60Updated last year
- Repository for the book Simplifying Machine Learning with PyCaret.☆67Updated 2 years ago
- The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.☆52Updated 2 years ago
- A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics …☆20Updated 4 years ago
- Example FastAPI app deployed to AWS with CDK.☆16Updated 2 years ago
- Demo on how to use Prefect 2 in an ML project☆41Updated 3 years ago
- Slides for "Feature engineering for time series forecasting" talk☆62Updated 3 years ago
- ☆30Updated 2 years ago
- Scripts and datasets for the O'Reilly book Python Polars: The Definitive Guide☆280Updated 2 months ago
- Full stack data engineering tools and infrastructure set-up☆57Updated 4 years ago
- Data Analysis with Polars, Published by Packt☆32Updated last year
- A repo for the book 'Streamlit for Data Science' by Tyler Richards☆235Updated 2 years ago