cpitclaudel / dBoost
☆16Updated 8 years ago
Related projects ⓘ
Alternatives and complementary repositories for dBoost
- Sketch and LSH Index library for Java, including OPH methods as well as the Lazo method☆13Updated 10 months ago
- A Generalized Data Cleaning System☆49Updated 8 years ago
- The BART Project: Benchmarking Algorithms for (data) Repairing and Translation☆35Updated 11 months ago
- ☆74Updated last year
- Affinity Propagation on Spark☆19Updated 3 years ago
- Explaining Inference Queries with Bayesian Optimization☆10Updated 3 years ago
- Yggdrasil: Faster Decision Trees Using Column Partitioning in Spark☆31Updated 6 years ago
- A library for exporting Spark ML models and pipelines to PFA☆54Updated 5 years ago
- Scalable Graph Mining☆61Updated last year
- Implementation of the Loopy Belief Propagation algorithm for Apache Spark☆42Updated 4 years ago
- Inspect ML Pipelines in Python in the form of a DAG☆68Updated 8 months ago
- SparkER: an Entity Resolution framework for Apache Spark☆63Updated 7 months ago
- Python application to setup and run streaming (contextual) bandit experiments.☆79Updated last year
- A Machine Learning System for Data Enrichment.☆75Updated 6 years ago
- Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptio…☆35Updated last year
- A library that allows serialization of SciKit-Learn estimators into PMML☆70Updated 5 years ago
- deep entity resolution lite version☆11Updated 4 years ago
- Spark Parameter Optimization and Tuning☆31Updated 6 years ago
- ☆49Updated last month
- Sketching linear classifiers over data streams with the Weight-Median Sketch (SIGMOD 2018).☆38Updated 6 years ago
- AutoBazaar: An AutoML System from the Machine Learning Bazaar☆32Updated 3 years ago
- A simplified version of featuretools for Spark☆30Updated 5 years ago
- Project overview and links to various resources☆17Updated 3 years ago
- ☆162Updated 3 years ago
- FlexMatcher is a schema matching package in Python which handles the problem of matching multiple schemas to a single mediated schema.☆30Updated this week
- A curated inventory of machine learning methods available on the Apache Spark platform, both in official and third party libraries.☆65Updated 7 years ago
- Willump Is a Low-Latency Useful Machine learning Platform.☆43Updated last year
- ☆38Updated 8 years ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆25Updated this week
- Additive Groves, Bagged Trees with Feature Evaluation, Interaction Detection, Visualization of Feature Effects.☆66Updated 3 years ago