PiercingDan / spark-Jupyter-AWS
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
☆261Updated 7 years ago
Alternatives and similar repositories for spark-Jupyter-AWS:
Users that are interested in spark-Jupyter-AWS are comparing it to the libraries listed below
- Deep Learning for Pugs☆74Updated 7 years ago
- Content for architecting a data science platform for products using Luigi, Spark & Flask.☆163Updated 5 years ago
- ☆263Updated 5 years ago
- VM based deployment for prototyping Big Data tools on Amazon Web Services☆128Updated 4 years ago
- ☆84Updated 7 years ago
- Magic functions for using Jupyter Notebook with Apache Spark and a variety of SQL databases.☆172Updated 6 years ago
- Curated list of all dataset websites that I find☆84Updated 6 years ago
- Sample repo for luigi tasks & config☆36Updated 8 years ago
- PyData Seattle 2015: Python Data Bikeshed☆127Updated 9 years ago
- DePy 2015 Talk☆117Updated 7 years ago
- PyData NYC 2015 conference☆94Updated 9 years ago
- Repository for PyCon 2016 workshop Natural Language Processing in 10 Lines of Code☆239Updated 7 years ago
- Observations from Ian on successfully delivering data science products☆543Updated 3 years ago
- ☆146Updated 9 years ago
- Model assisted random sampling.☆120Updated 4 years ago
- An implementation of JupyterHub within the Amazon cloud, with automatic scaling up and down☆128Updated last year
- Directory of Jupyter notebooks exploring various topics☆316Updated 8 years ago
- An external PySpark module that works like R's read.csv or Panda's read_csv, with automatic type inference and null value handling. Parse…☆90Updated 9 years ago
- ☆52Updated 8 years ago
- A fork of the cookiecutter-data-science leveraging Docker for local development.☆130Updated 5 years ago
- ☆117Updated 3 months ago
- How data science is woven into the fabric of Stitch Fix☆171Updated last month
- Start a cluster in EC2 for dask.distributed☆106Updated 4 years ago
- Arbalest is a Python data pipeline orchestration library for Amazon S3 and Amazon Redshift. It automates data import into Redshift and ma…☆41Updated 9 years ago
- Open source Flotilla☆193Updated this week
- Jupyter Notebook extension for Apache Spark integration☆191Updated 4 years ago
- Code for Learning with Data Blog☆64Updated 8 years ago
- PyData, The Complete Works of☆299Updated 8 years ago
- Scripts used to setup a Spark cluster on EC2☆393Updated 7 years ago
- ☆160Updated 8 years ago