caizkun / mapreduce-examplesLinks
A collection of mapreduce problems and solutions
☆35Updated 8 years ago
Alternatives and similar repositories for mapreduce-examples
Users that are interested in mapreduce-examples are comparing it to the libraries listed below
Sorting:
- Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.☆95Updated 4 years ago
- Classwork projects and home works done through Udacity data engineering nano degree☆75Updated 2 years ago
- Code examples on Apache Spark using python☆108Updated 3 years ago
- Data pipeline project☆47Updated 11 months ago
- ☆152Updated 7 years ago
- Mastering Big Data Analytics with PySpark, Published by Packt☆165Updated last year
- Develop ML models predict taxi trip duration in NYC. Ranked : Top 6% | RMSLE : 0.377 (Kaggle) | #DS☆17Updated 3 years ago
- Udacity Data Engineering Nanodegree Program☆53Updated 4 years ago
- PySpark functions and utilities with examples. Assists ETL process of data modeling☆104Updated 5 years ago
- This repository focuses on gathering and making a curated list resources to learn Hadoop for FREE.☆56Updated 7 years ago
- This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessar…☆47Updated 2 years ago
- O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian☆227Updated 2 years ago
- This repository implements a real-time credit card fraud detection pipeline using Kafka, Spark and Cassandra. Kafka continuously produces…☆22Updated 5 years ago
- Fundamentals of Spark with Python (using PySpark), code examples☆362Updated 3 years ago
- Stream/batch system with Hadoop, Spark on NYC taxi data | #DE☆26Updated 4 months ago
- Multi-container environment with Hadoop, Spark and Hive☆231Updated 9 months ago
- Projects done in the Data Engineering Nanodegree by Udacity.com☆273Updated 6 years ago
- ETL pipeline using pyspark (Spark - Python)☆116Updated 5 years ago
- This repository contains Spark, MLlib, PySpark and Dataframes projects☆49Updated 8 years ago
- LearningApacheSpark☆250Updated 2 years ago
- ☆170Updated 3 years ago
- PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like…☆141Updated 2 years ago
- Counting Tweets Per User in Real-Time☆43Updated 8 years ago
- Because its never late to start taking notes and 'public' it...☆62Updated 8 months ago
- A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and D…☆29Updated 5 years ago
- My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggrega…☆509Updated 3 years ago
- Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift,…☆57Updated 3 years ago
- All my projects on Big Data are provided☆27Updated 9 years ago
- Notes on Apache Spark (pyspark)☆297Updated 6 years ago
- Apache Spark 3 - Structured Streaming Course Material☆126Updated 2 years ago