holdenk / sparklingpandas
Sparkling Pandas
☆25Updated 7 years ago
Alternatives and similar repositories for sparklingpandas:
Users that are interested in sparklingpandas are comparing it to the libraries listed below
- Gaussian Mixture Model Implementation in Pyspark☆32Updated 10 years ago
- Wabbit Wappa is a full-featured Python wrapper for the Vowpal Wabbit machine learning utility.☆101Updated 7 years ago
- A library that allows serialization of SciKit-Learn estimators into PMML☆70Updated 5 years ago
- ☆111Updated 7 years ago
- PySpark for Elastic Search☆55Updated 7 years ago
- Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead☆52Updated 6 years ago
- An Apache Spark-shell backend for IPython☆105Updated 3 years ago
- An API for Distributed Machine Learning☆154Updated 8 years ago
- This project contains the code to translate between Apache Spark and SFrame.☆20Updated 8 years ago
- Vowpal Wabbit Webservice. A web service that accepts VW formatted text and runs it through a VW daemon instance.☆40Updated 8 years ago
- Yahoo!'s topic modelling framework using Latent Dirichlet Allocation☆97Updated 13 years ago
- My capstone project for Galvanize (Zipfian Academy)☆38Updated 6 years ago
- Distributed Matrix Library☆71Updated 8 years ago
- An example of running Apache Spark using Scala in ipython notebook☆140Updated 9 years ago
- Spark-based approximate nearest neighbor search using locality-sensitive hashing☆104Updated 8 years ago
- GPU Acceleration for Apache Spark☆34Updated 9 years ago
- PyMC version 3 (PyMC 2 is in branch 2.3)☆27Updated 10 years ago
- Assembly of fundamental statistics implemented based on Apache Spark☆31Updated 9 years ago
- Code to allow running BIDMach on Spark including HDFS integration and lightweight sparse model updates (Kylix).☆15Updated 4 years ago
- Training materials for Strata, AMP Camp, etc☆150Updated 9 years ago
- [NO LONGER MAINTAINED AS OPEN SOURCE - USE SCALETEXT.COM INSTEAD]☆108Updated 11 years ago
- Repo for experiments on pyspark and sklearn☆79Updated 11 years ago
- LSH based high dimensional clustering for sets and points☆78Updated 10 years ago
- Code for Criteo competition http://www.kaggle.com/c/criteo-display-ad-challenge☆22Updated 10 years ago
- Distributed Streaming Quantiles (for PySpark)☆37Updated 11 years ago
- C++ native client for Impala and Hive, with Python / pandas bindings☆72Updated 6 years ago
- Quick summary: This code implements a spectral (third order tensor decomposition) learning method for learning LDA topic model on Spark.☆105Updated 6 years ago
- Unified interface for local and distributed ndarrays☆157Updated 6 years ago
- A RESTful web service that runs microtasks across multiple crowds, provides quality control techniques, and is easily extensible.☆51Updated 7 years ago
- Locality Sensitive Hashing for Apache Spark☆87Updated 3 years ago