JensRantil / disco-slctLinks
A mapreduce implementation of SLCT (http://ristov.users.sourceforge.net/slct/) using Disco.
☆16Updated 14 years ago
Alternatives and similar repositories for disco-slct
Users that are interested in disco-slct are comparing it to the libraries listed below
Sorting:
- Security log file challenge☆28Updated 9 years ago
- POC IDS anomaly detection engine built with iPython notebook, matplotlib, pandas, numpy, scikit-learn, d3.js, hyperloglog implementation,…☆79Updated 11 years ago
- Code reference from my Qbox blog posts.☆87Updated 10 years ago
- A RESTful web service that runs microtasks across multiple crowds, provides quality control techniques, and is easily extensible.☆52Updated 8 years ago
- The metric correlation component of Etsy's Kale system☆708Updated 8 years ago
- SAMOA (Scalable Advanced Massive Online Analysis) is an open-source platform for mining big data streams.☆427Updated 9 years ago
- Tools for writing, submitting, debugging, and monitoring Storm topologies in pure Python☆246Updated 2 years ago
- Pyleus is a Python framework for developing and launching Storm topologies.☆400Updated 6 years ago
- Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.☆241Updated 9 years ago
- A Python HTTP client to the Prelert Anomaly Detective Engine REST API - ARCHIVED☆32Updated 9 years ago
- Python language Plugin for elasticsearch☆103Updated 6 years ago
- Probabilistic Data Structures in Python (originally presented at PyData 2013)☆55Updated 3 years ago
- Secondary indexing for structured and unstructured data in Big Table style databases.☆44Updated 5 years ago
- Battle-tested Apache Storm Multi-Lang implementation for Python☆70Updated 2 months ago
- python elasticsearch client☆361Updated 3 years ago
- Toy single-machine implementation of the Pregel graph-based framework☆118Updated 8 years ago
- Naarad is a highly configurable system analysis tool that parses and plots timeseries data for better visual correlation. Naarad was buil…☆238Updated 8 years ago
- An extension of the kafka-python package that adds features like multiprocess consumers.☆39Updated 2 years ago
- Scalable Machine Learning in Scalding☆360Updated 7 years ago
- code for kaggle competition Microsoft malware classification☆249Updated 10 years ago
- Ipython notebook that illustrates effectiveness of machine learning algorithms in anomaly detection of netflow data (inbound/outbound DDo…☆79Updated 8 years ago
- PySpark for Elastic Search☆55Updated 8 years ago
- Experimental parallel data analysis toolkit.☆122Updated 3 years ago
- unofficial git mirror of http://svn.whoosh.ca svn repo☆49Updated 15 years ago
- Material for talk "Machine Learning 101" https://speakerdeck.com/kastnerkyle/pycon2015 https://us.pycon.org/2015/schedule/presentation/36…☆87Updated 10 years ago
- Tail a log file and send log lines automatically to a kafka topic☆57Updated 13 years ago
- My capstone project for Galvanize (Zipfian Academy)☆38Updated 6 years ago
- Lossy Counting and Sticky Sampling implementation for efficient frequency counts on data streams.☆63Updated 9 years ago
- ☆146Updated 9 years ago
- Estimating how similar are two sets using MinHash (Jaccard similarity coefficient)☆30Updated 12 years ago