PeachstoneIO / peachbox
Python based data warehouse solution for the Lambda Architecture.
☆14Updated 9 years ago
Related projects ⓘ
Alternatives and complementary repositories for peachbox
- S3 backed ContentsManager for jupyter notebooks☆13Updated 8 years ago
- framework for making streamcorpus data☆11Updated 7 years ago
- Utilities and examples to asssist in working with PySpark and Cassandra.☆36Updated 9 years ago
- This project contains the code to translate between Apache Spark and SFrame.☆21Updated 8 years ago
- Recommendations Serving Engine using python☆28Updated 9 years ago
- from zero to storm cluster for realtime classification using sklearn☆12Updated 10 years ago
- code and slides for my PyGotham 2016 talk, "Higher-level Natural Language Processing with textacy"☆15Updated 8 years ago
- High Level Kafka Scanner☆19Updated 7 years ago
- SQLAlchemy models and DDL and ERD generation from chop-dbhi/data-models style JSON endpoints.☆11Updated last year
- Collects multimedia content shared through social networks.☆19Updated 9 years ago
- Predicting sales with Pandas☆15Updated 9 years ago
- Seldon Spark Jobs☆26Updated 9 years ago
- SMART-Learner is a machine learning library built with researchers in mind.☆10Updated 8 years ago
- Source code for exploring MLlib blog post☆11Updated 9 years ago
- Python and Scala APIs for enhanced Spark analytics☆11Updated 7 years ago
- common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text☆34Updated 8 years ago
- Visualization and summarization of a collection of documents.☆20Updated 2 years ago
- A distributed in-memory fabric based on shared-memory blocks and datashape. Any language can operate on the data.☆13Updated 8 years ago
- Open source analytics platform powered by Apache Cassandra, Spark, and Kafka☆34Updated 9 years ago
- python library for interacting with SolrCloud☆36Updated 3 years ago
- Tutorial for Deploying Anaconda Cluster and PySpark on top of Red Hat Storage GlusterFS☆8Updated 9 years ago
- Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead☆53Updated 6 years ago
- A demo of how to use PageRank with Hadoop and SociaLite to identify anomalies in Healthcare Data☆47Updated 8 years ago
- Task Orchestration Tool Based on SWF and boto3☆38Updated 6 years ago