gakhov / pdsaLinks
Probabilistic Data Structures and Algorithms in Python
☆130Updated 5 years ago
Alternatives and similar repositories for pdsa
Users that are interested in pdsa are comparing it to the libraries listed below
Sorting:
- Core C++ Sketch Library☆253Updated this week
- A polystore database from researchers of the Intel Science and Technology Center for Big Data☆39Updated 3 years ago
- Cylon is a fast, scalable, distributed memory, parallel runtime with a Pandas like DataFrame.☆301Updated this week
- FlorDB 🌻☆158Updated 3 months ago
- Distribution transparent Machine Learning experiments on Apache Spark☆91Updated last year
- A Scalable Auto-ML System☆55Updated 3 years ago
- Python bindings for xorfilter(faster and smaller than bloom and cuckoo filters)☆120Updated last month
- Python bindings to Succinct Data Structure Library 2.0☆34Updated 6 years ago
- Sketching linear classifiers over data streams with the Weight-Median Sketch (SIGMOD 2018).☆39Updated 7 years ago
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆41Updated 2 years ago
- Website for DataSketches.☆108Updated last week
- A platform for online learning that curtails data latency and saves you cost.☆47Updated 4 years ago
- ☆36Updated 2 years ago
- Python implementations of the distributed quantile sketch algorithm DDSketch☆89Updated 9 months ago
- Keyvi - the key value index. It is an in-memory FST-based data structure highly optimized for size and lookup performance.☆257Updated this week
- Apache datasketches☆39Updated last week
- Parameterless and Universal FInding of Nearest Neighbors☆59Updated 10 months ago
- an anagram☆137Updated 4 years ago
- Example for simple Apache Arrow Flight service with Apache Spark and TensorFlow clients☆37Updated 4 years ago
- Friendly ML feature store☆45Updated 3 years ago
- In-Memory Analytics with Apache Arrow, published by Packt☆104Updated last month
- Multi-core Window-Based Stream Processing Engine☆73Updated 4 years ago
- Weighted MinHash implementation on CUDA (multi-gpu).☆121Updated 2 years ago
- Willump Is a Low-Latency Useful Machine learning Platform.☆45Updated 2 years ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆29Updated last year
- Distributed SQL Query Engine in Python using Ray☆246Updated last year
- Distributed SQL Engine in Python using Dask☆409Updated last year
- This repository provides Scotty, a framework for efficient window aggregations for out-of-order Stream Processing.☆79Updated 2 years ago
- Interactive-Speed Analytics: 200x Faster, 200x Fewer Cluster Resources, Approximate Query Processing☆252Updated 5 years ago
- A collection of libraries for single-pass, distributed, sublinear-space approximate aggregation and sketching algorithms. Currently: Hype…☆165Updated 8 months ago