A repository of Apache Spark projects, training projects, and tutorials, in both Scala and Python.
☆33Sep 15, 2021Updated 4 years ago
Alternatives and similar repositories for spark
Users that are interested in spark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A complete data engineering project demonstrating modern data stack practices with Apache Flink, Iceberg, Trino and Superset☆23Sep 29, 2025Updated 6 months ago
- Web scraped Data Science Interview Questions from Towards Data Science/ Medium.com asked by FAANG/Top Product based companies in last 4-5…☆16Jan 16, 2023Updated 3 years ago
- Use React as the front end to develop an E-Commerce website☆17Nov 19, 2021Updated 4 years ago
- All my projects on Big Data are provided☆27Dec 5, 2016Updated 9 years ago
- This project provides valuable customer sentiment insights for Zomato by tracking and analyzing tweets related to their brand and service…☆14Aug 27, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Loan Default Prediction using PySpark, with jobs scheduled by Apache Airflow and Integration with Spark using Apache Livy☆22Dec 26, 2020Updated 5 years ago
- Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.☆95May 19, 2021Updated 4 years ago
- ☆64Jan 9, 2024Updated 2 years ago
- Server side API for QANTA quiz bowl system☆10Jan 31, 2019Updated 7 years ago
- Repo for the coursera Getting and Cleaning Data Course Project☆11Sep 27, 2015Updated 10 years ago
- Jupyter Notebook for PyData DC 2016 on "Making Your Code Faster: Cython and parallel processing in the Jupyter Notebook"☆12Nov 12, 2016Updated 9 years ago
- Proof of concept lambda for massive parallelism☆10Nov 2, 2018Updated 7 years ago
- ☆31Mar 23, 2023Updated 3 years ago
- A dataset of 'historical' data, useful for munging/ cleaning practice☆13Mar 12, 2018Updated 8 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- "Not too complicated" training code for CIFAR-10 by PyTorch Lightning☆12Jun 5, 2022Updated 3 years ago
- Spring Boot Testing Examples☆26Aug 18, 2023Updated 2 years ago
- This repository contains Spark, MLlib, PySpark and Dataframes projects☆49Oct 22, 2017Updated 8 years ago
- These are projects for healthcare analytics. The projects are based on open data on health care.☆17Oct 31, 2020Updated 5 years ago
- ☆11Dec 2, 2016Updated 9 years ago
- Linux Administration Bootcamp Go from Beginner to Advanced, published by Packt☆12Jan 30, 2023Updated 3 years ago
- RELK -- The Research Elastic Stack (Kafka, Beats, Zookeeper, Logstash, ElasticSearch, Kibana, Spark, & Jupyter -- All in Docker)☆27Nov 7, 2019Updated 6 years ago
- pyspark framework☆25Feb 22, 2022Updated 4 years ago
- Notebooks for deep learning course☆14Jan 6, 2022Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Mirror of the original repo: @pushpakumar02/linkedIn-job-directory created and maintained by @pushpakumar02☆82Jul 29, 2025Updated 8 months ago
- Spark cluster in docker containers with sample training Jupyter notebooks☆27Feb 24, 2023Updated 3 years ago
- Let's build an image filter app from scratch☆11Apr 13, 2023Updated 3 years ago
- ☆15Aug 11, 2022Updated 3 years ago
- Uses RNN on the Nietzsche dataset☆15May 28, 2017Updated 8 years ago
- Demo bot for Random Access Navigation☆13May 8, 2017Updated 8 years ago
- Messing around with SQL and Postgres☆12May 11, 2021Updated 4 years ago
- scikit-learn course for 2017 NGCM Summer Academy☆17Jun 30, 2017Updated 8 years ago
- code and slides for my PyGotham 2016 talk, "Higher-level Natural Language Processing with textacy"☆15Jul 17, 2016Updated 9 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- AWS Quick Start Team☆23Oct 3, 2024Updated last year
- Python interface for building rule-based expert systems over PyCLIPS☆14Nov 18, 2022Updated 3 years ago
- A plug-and-play library to interface with Fawry's payment gateway API (charge, refund, payment status, service callback).☆20Nov 10, 2019Updated 6 years ago
- For this project I am creating an ETL (Extract, Transform, and Load) pipeline using Python, RegEx, and SQL Database. The goal is to retri…☆26Feb 9, 2021Updated 5 years ago
- Python data analysis course for 2017 NGCM Summer Academy☆21Jun 28, 2017Updated 8 years ago
- Repo for Deep Learning Projects in NLP, GANs, Computer Vision☆19May 6, 2018Updated 7 years ago
- data-warehouse-snowflake-for-data-engineering☆19Sep 14, 2023Updated 2 years ago