shauryashaurya / learn-data-munging
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
☆44Updated last month
Related projects ⓘ
Alternatives and complementary repositories for learn-data-munging
- Duke MIDS: Data Engineering and DataOps Course☆57Updated last year
- ☆24Updated 4 months ago
- Deploy A/B testing infrastructure in a containerized microservice architecture for Machine Learning applications.☆39Updated last year
- Code and materials for Effective Polars book☆67Updated 7 months ago
- Demo on how to use Prefect with Docker☆26Updated 2 years ago
- Public notebooks and datasets to accompany the Data Analysis with Polars course on Udemy☆41Updated last year
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆111Updated 7 months ago
- Cost Efficient Data Pipelines with DuckDB☆46Updated 3 months ago
- Intro to Polars Tutorial☆20Updated last year
- This is your friendly fitness assistant, which is a RAG applications built as a part of LLM Zoomcamp☆50Updated last month
- The getting started notebook for the DTC Zoomcamp Q&A challenge☆28Updated 11 months ago
- Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.☆126Updated 10 months ago
- Code for "Advanced data transformations in SQL" free live workshop☆64Updated 2 weeks ago
- Example repo to kickstart integration with mlflow recipes.☆38Updated 2 months ago
- Solutions for the Machine Learning Zoomcamp 2022 by DataTalks.Club.☆24Updated last year
- ☆24Updated 2 years ago
- Code samples for the Effective Data Science Infrastructure book☆110Updated last year
- Essential PySpark for Scalable Data Analytics, published by Packt☆43Updated last year
- Slides for "Feature engineering for time series forecasting" talk☆57Updated last year
- It's all in the name☆74Updated last year
- An example MLFlow project☆48Updated 2 years ago
- This is a capstone project associated with MLOps Zoomcamp. The end goal of the project is to build an end-to-end machine learning projec…☆13Updated 2 years ago
- Demo for CI/CD in a machine learning project☆93Updated last year
- ☆30Updated last year
- ☆29Updated 2 years ago
- Code for blog at https://www.startdataengineering.com/post/python-for-de/☆55Updated 5 months ago
- Code for my "Efficient Data Processing in SQL" book.☆49Updated 3 months ago
- Pandas Training © MetaSnake 2022, CC BY-NC☆18Updated 2 years ago
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 2 years ago
- Notes from our NLP reading club!☆16Updated 3 years ago