Some class materials for a data processing course using PySpark
☆52Dec 3, 2022Updated 3 years ago
Alternatives and similar repositories for data_processing_course
Users that are interested in data_processing_course are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Create LAMP Stack using terraform with AWS☆11Feb 15, 2023Updated 3 years ago
- Hadoop Examples☆10Jul 1, 2022Updated 3 years ago
- Add gevent support to DataStax Python Driver for Apache Cassandra☆11Jun 10, 2020Updated 5 years ago
- This project is mainly for learning and practicing simple HIVE commands in real time scenarios. Here we have taken some sample coffee sho…☆11Mar 1, 2018Updated 8 years ago
- All Certification and preparation, examples & others☆11Oct 18, 2018Updated 7 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Testbench for experimenting with Apache Hive at any data scale.☆64Jul 10, 2017Updated 8 years ago
- Projects from my Hadoop training sessions☆16Feb 22, 2018Updated 8 years ago
- Automated (Ansible) installation of HDP via Ambari Blueprint☆16Mar 10, 2017Updated 9 years ago
- ☆14Aug 24, 2021Updated 4 years ago
- pyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4☆69May 14, 2026Updated last week
- Projects from Udacity Data Streaming Nanodegree☆15Aug 14, 2023Updated 2 years ago
- Ansible playbooks for Apache Spark on kube☆27Jul 20, 2017Updated 8 years ago
- Deploy Dask on Marathon☆10Feb 6, 2017Updated 9 years ago
- Set of Shell scripts to automate Linux from Scratch, based on the book 7.8☆31Jan 10, 2018Updated 8 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 21st place Kaggle solution for Santander Product Recommendaiton https://www.kaggle.com/c/santander-product-recommendation☆13Aug 29, 2019Updated 6 years ago
- PySpark Tutorial for Beginners on Google Colab: Hands-On Guide☆17Sep 13, 2020Updated 5 years ago
- All my projects on Big Data are provided☆27Dec 5, 2016Updated 9 years ago
- Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.☆21Jan 30, 2019Updated 7 years ago
- Finance 🏦 Data Builder 🛠️ @ postgres 🐘☆22Feb 11, 2021Updated 5 years ago
- Ansible crash course☆39May 3, 2019Updated 7 years ago
- Convex Formulation of Multiple Instance Learning from Positive and Unlabeled Bags☆10Apr 28, 2018Updated 8 years ago
- Local Development of AWS Glue with Docker and Visual Studio Code☆14Nov 29, 2021Updated 4 years ago
- Dockerfiles for vault, consul, test-kitchen, etsy-mixer, curl-loader, fwd, toolkit☆18Jul 20, 2021Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A Scalable Data Cleaning Library for PySpark.☆29Apr 4, 2019Updated 7 years ago
- A collection of data analysis projects done using PySpark via Jupyter notebooks.☆10Oct 8, 2022Updated 3 years ago
- Rasa Chatbot using Django backend and Sockets for communication☆12Dec 8, 2022Updated 3 years ago
- Sentiment Analysis of a Twitter Topic with Spark Structured Streaming☆55Dec 12, 2018Updated 7 years ago
- My presentation at ODSC India 2018 about Deep Learning with Apache Spark☆27Sep 1, 2018Updated 7 years ago
- Power Plant ML Pipeline Application - Apache Spark☆12Dec 12, 2016Updated 9 years ago
- ansible playbook to deploy cloudera hadoop components to the cluster☆53Sep 8, 2018Updated 7 years ago
- A data engineering pipeline for digital marketers.☆11Dec 21, 2018Updated 7 years ago
- running apache spark with docker swarm☆34Feb 25, 2021Updated 5 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- An 802.11 probe request and beacon sniffer.☆14Mar 16, 2021Updated 5 years ago
- A fast and low memory requirement version of PointHop and PointHop++, which is built upon Apache Spark.☆10Jul 14, 2020Updated 5 years ago
- A project for the development of rich geospatial data from the city of São Paulo for use in Machine Learning models.☆11Jul 4, 2021Updated 4 years ago
- DevOps☆16May 17, 2021Updated 5 years ago
- MeLi 2020 challenge Winner Solution☆11Dec 9, 2020Updated 5 years ago
- Python solutions to problems posted on http://codility.com/☆11Nov 13, 2013Updated 12 years ago
- Dockerizing an Apache Spark Standalone Cluster☆42Jun 29, 2022Updated 3 years ago