snowplow-archive / snowplow.github.comLinks
Legacy Snowplow website, switched off 25 April 2017
☆16Updated 8 years ago
Alternatives and similar repositories for snowplow.github.com
Users that are interested in snowplow.github.com are comparing it to the libraries listed below
Sorting:
- Model assisted random sampling.☆119Updated 5 years ago
- ☆146Updated 9 years ago
- Sample repo for luigi tasks & config☆36Updated 9 years ago
- Elasticsearch entity resolution plugin based on Duke☆209Updated 5 years ago
- Content for architecting a data science platform for products using Luigi, Spark & Flask.☆163Updated 6 years ago
- Gallery of Apache Zeppelin notebooks☆216Updated 6 years ago
- ☆110Updated 8 years ago
- A platform for real-time streaming search☆102Updated 9 years ago
- Coding exercises for Apache Spark☆104Updated 10 years ago
- REST web service for scoring PMML models☆50Updated 12 years ago
- Learn the pyspark API through pictures and simple examples☆170Updated 5 years ago
- PySpark for Elastic Search☆55Updated 8 years ago
- Rudimentary Bayesian Beta-Bernoulli A/B testing inference and visualization code.☆64Updated 11 years ago
- A curated list of all the awesome examples, articles, tutorials and videos for Apache Airflow.☆96Updated 5 years ago
- Example unit tests for Apache Spark Python scripts using the py.test framework☆84Updated 9 years ago
- Google BigQuery support for Spark, SQL, and DataFrames☆156Updated 6 years ago
- A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support☆260Updated 8 years ago
- Data models for snowplow analytics.☆129Updated last month
- Repository with examples and smoke tests for the GCP Airflow operators and hooks☆152Updated 9 years ago
- Natural Language Processing with Spark's MLlib☆63Updated 8 years ago
- Arbalest is a Python data pipeline orchestration library for Amazon S3 and Amazon Redshift. It automates data import into Redshift and ma…☆40Updated 10 years ago
- An easily-deployable, single-instance version of Snowplow☆129Updated last month
- Csv2Hive is an useful CSV schema finder for the Big Data. It discovers automatically schemas in big CSV files, generates the 'CREATE TABL…☆27Updated 8 years ago
- Tools, wrappers, etc... for data science with a concentration on text processing☆207Updated 3 years ago
- Docker images for Snowplow, Iglu and associated projects☆61Updated 4 years ago
- Python SDK for working with Snowplow enriched events in Spark, AWS Lambda et al.☆21Updated last year
- A prototype of Hive UDFs/UDTFs that execute nested SQL queries within rows.☆54Updated 10 years ago
- Framework for setting up predictive analytics services☆488Updated 2 years ago
- Some class materials for a data processing course using PySpark☆52Updated 3 years ago
- Simple Spark example of generating table stats for use of data quality checks☆28Updated 8 years ago