CamDavidsonPilon / tdigest
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
☆382Updated last year
Related projects ⓘ
Alternatives and complementary repositories for tdigest
- Pandas ExtensionDType/Array backed by Apache Arrow☆229Updated last year
- Robustly estimate trend and periodicity in a timeseries.☆373Updated 6 years ago
- GAM timeseries modeling with auto-changepoint detection. Inspired by Facebook Prophet and implemented in PyMC3☆328Updated 4 years ago
- A library for defensive data analysis.☆501Updated 4 years ago
- A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means☆1,992Updated 10 months ago
- Run IPython notebooks as command-line scripts, generate HTML reports☆449Updated 6 years ago
- Describing statistical models in Python using symbolic formulas☆951Updated this week
- ☆162Updated 3 years ago
- Open-source Python library for statistical analysis of randomised control trials (A/B tests)☆333Updated last year
- A Python library for unevenly-spaced time series analysis☆529Updated 2 months ago
- Design documents and code for the pandas 2.0 effort.☆306Updated 6 years ago
- Compiled Decision Trees for scikit-learn☆224Updated 5 months ago
- Implementation of statistical models to analyze time lagged conversions☆258Updated 5 months ago
- dask-searchcv is now part of dask-ml: https://github.com/dask/dask-ml☆240Updated 6 years ago
- A Python port of Twitter's AnomalyDetection R Package☆365Updated 4 years ago
- Interactive plotting for Pandas using Vega-Lite☆344Updated 5 years ago
- A garden for scikit-learn compatible trees☆285Updated 4 months ago
- Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with …☆621Updated this week
- python implementation of the parquet columnar file format.☆783Updated last week
- Fast HyperLogLog for Python.☆99Updated 2 months ago
- Python Implementation of Hyper LogLog and Sliding Hyper LogLog algorithms☆228Updated last year
- Breakout Detection via Robust E-Statistics☆755Updated 7 years ago
- Confidence intervals for scikit-learn forest algorithms☆282Updated 3 months ago
- Numba extension for compiling Pandas data frames, Intel® Scalable Dataframe Compiler☆646Updated last year
- A python library for Bayesian time series modeling☆476Updated 2 months ago
- A columnar data container that can be compressed.☆959Updated 2 years ago
- ☆459Updated last year
- Sparkling Pandas☆361Updated last year
- Bulwark is a package for convenient property-based testing of pandas dataframes.☆223Updated 4 years ago
- A library for factorization machines and polynomial networks for classification and regression in Python.☆245Updated 4 years ago