gakhov / pdsa
Probabilistic Data Structures and Algorithms in Python
☆125Updated 5 years ago
Alternatives and similar repositories for pdsa:
Users that are interested in pdsa are comparing it to the libraries listed below
- Core C++ Sketch Library☆230Updated last month
- Python bindings for xorfilter(faster and smaller than bloom and cuckoo filters)☆115Updated 6 months ago
- Python bindings to Succinct Data Structure Library 2.0☆30Updated 5 years ago
- Probabilistic data structures in python http://pyprobables.readthedocs.io/en/latest/index.html☆116Updated 3 months ago
- The stupidest database of all time.☆55Updated last week
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆29Updated 3 months ago
- Python implementations of the distributed quantile sketch algorithm DDSketch☆86Updated 6 months ago
- Apache datasketches☆28Updated last month
- PostgreSQL extension providing approximate algorithms based on apache/datasketches-cpp☆86Updated 2 months ago
- MonetDBLite as a Python Package☆32Updated 3 years ago
- Website for DataSketches.☆98Updated last week
- Dremio Flight connector. Access Dremio using Arrow flight☆40Updated 4 years ago
- Keyvi - the key value index. It is an in-memory FST-based data structure highly optimized for size and lookup performance.☆242Updated this week
- Cylon is a fast, scalable, distributed memory, parallel runtime with a Pandas like DataFrame.☆299Updated 9 months ago
- A work-in-progress book on Dask☆12Updated last year
- Parameterless and Universal FInding of Nearest Neighbors☆59Updated 2 weeks ago
- List of papers, reports and links of materials on Big Data and related topics.☆38Updated 7 years ago
- Fast HyperLogLog for Python.☆104Updated 2 months ago
- Example for simple Apache Arrow Flight service with Apache Spark and TensorFlow clients☆36Updated 4 years ago
- SetSketch: Filling the Gap between MinHash and HyperLogLog☆49Updated 3 years ago
- A polystore database from researchers of the Intel Science and Technology Center for Big Data☆37Updated 2 years ago
- Point-in-Time optimizations for Apache Spark☆29Updated last year
- Sketching linear classifiers over data streams with the Weight-Median Sketch (SIGMOD 2018).☆38Updated 6 years ago
- Friendly ML feature store☆45Updated 2 years ago
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆42Updated last year
- Embedded MonetDB with a Python frontend and fast Numpy/Pandas support☆62Updated 5 months ago
- A General-Purpose Counting Filter: Counting Quotient Filter☆127Updated last year
- Apache datasketches☆95Updated 2 years ago
- Materials for Apache Arrow workshop at VLDB 2019☆42Updated 4 years ago
- Readings in Stream Processing☆122Updated 4 months ago