pnavaro / big-dataLinks
Python tools for big data
☆53Updated last year
Alternatives and similar repositories for big-data
Users that are interested in big-data are comparing it to the libraries listed below
Sorting:
- Notebooks that support blog posts and tech talks on Dask / Coiled.☆47Updated 3 months ago
- A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using th…☆115Updated 2 years ago
- Data Analysis Baseline Library☆132Updated 7 months ago
- Tries to shrink your Pandas column dtypes with no data loss so you have more spare RAM☆84Updated last year
- Phi_K correlation analyzer library☆164Updated 4 months ago
- ☆76Updated last month
- How to Interpret SHAP Analyses: A Non-Technical Guide☆54Updated 3 years ago
- CraftML is a restful web service for easy pipeline creation without code.☆13Updated 4 years ago
- ☆46Updated 10 months ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated last year
- Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.☆130Updated last year
- An abstraction layer for parameter tuning☆35Updated 9 months ago
- Plugins, extensions, case studies, articles, and video tutorials for Kedro☆76Updated 5 months ago
- pipreqs with jupyter notebook support☆69Updated last year
- big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.☆65Updated 5 years ago
- Code samples for the Effective Data Science Infrastructure book☆115Updated 2 years ago
- Public notebooks and datasets to accompany the Data Analysis with Polars course on Udemy☆42Updated last year
- Automatically export Jupyter notebooks to various file formats (.py, .html, and more) on save.☆77Updated last year
- JupyterLab extension to create GitHub commits & pull requests☆119Updated 11 months ago
- A Kedro plugin that provides pandas dropin replacements for the pandas datasets (e.g modin and cuDF)☆12Updated 4 years ago
- There are always multiple ways to complete a task in Pandas. A minimal subset of the library is sufficient for almost everything.☆84Updated 2 years ago
- SparkConnect Server plugin and protobuf messages for the Amazon Deequ Data Quality Engine.☆26Updated 3 months ago
- 📝 Examples of how to use Neptune for different use cases and with various MLOps tools☆85Updated last month
- Tutorial material on machine learning with dirty data in Python☆60Updated 10 months ago
- Tutorials on creating a reproducible and maintainable data science project☆144Updated 2 years ago
- Practical Deep Learning on the Cloud, published by Packt☆41Updated 2 years ago
- Use pathlib syntax to easily work with Pandas series containing file paths.☆69Updated 2 years ago
- ☆13Updated 3 years ago
- Talks about vaex☆36Updated 2 years ago
- Companion Notebooks and Data for Data Science with Python and Dask from Manning Publications☆52Updated 4 years ago