newfront/spark-intro-to-ml

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/newfront/spark-intro-to-ml)

newfront / spark-intro-to-ml

A Gentle introduction to Machine Learning with Apache Spark

☆11

Alternatives and similar repositories for spark-intro-to-ml

Users that are interested in spark-intro-to-ml are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

simonprickett / google-cloud-functions-python
View on GitHub
Google Cloud Functions Python Runtime Demo
☆12Jul 27, 2018Updated 8 years ago
eadgbear / spark-wasm-udf
View on GitHub
Using WASM to write UDFs in Apache Spark
☆12Jun 3, 2024Updated 2 years ago
bartosz25 / acid-file-formats
View on GitHub
Code for Apache Hudi, Apache Iceberg and Delta Lake analysis
☆10Feb 2, 2024Updated 2 years ago
richardanaya / spark_delta_lake
View on GitHub
☆16Jun 27, 2020Updated 6 years ago
slmttndrk / Turkish_Sentiment_Analysis_With_Multinomial_Naive_Bayes
View on GitHub
THIS PROJECT IS ABOUT TURKISH SENTIMENT ANALYSIS
☆14Aug 23, 2019Updated 6 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
kelseyhightower / kubernetes-letsencrypt-tutorial
View on GitHub
WIP: Kubernetes Lets Encrypt Tutorial
☆27Jul 18, 2016Updated 10 years ago
masfworld / cdc_deltaLake
View on GitHub
Docker compose and Google Colab demo to build a CDC with Delta Lake
☆15Sep 7, 2022Updated 3 years ago
awslabs / amazon-s3-tagging-spark-util
View on GitHub
☆12Oct 16, 2023Updated 2 years ago
aws-samples / dbtgluenyctaxidemo
View on GitHub
☆11Oct 11, 2022Updated 3 years ago
tecton-ai / apply-workshop-2022
View on GitHub
☆17Aug 5, 2022Updated 3 years ago
bartosz25 / data-ai-summit-2024
View on GitHub
Visits sessionization pipeline used for the talk
☆13May 28, 2024Updated 2 years ago
aws-samples / amazon-emr-optimize-data-processing
View on GitHub
Optimizing downstream data processing with Amazon Kinesis Data Firehose and Amazon EMR running Apache Spark
☆14Apr 14, 2023Updated 3 years ago
joomcode / spark-platform
View on GitHub
Basic Spark utilities
☆13Updated this week
ognis1205 / mcp-server-unitycatalog
View on GitHub
Unity Catalog AI Model Context Protocol Server
☆15Mar 28, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
zhang943 / Spark-Apriori
View on GitHub
An implementation of apriori algorithm under spark platform
☆11Dec 13, 2018Updated 7 years ago
AlexMercedCoder / understanding_data_with_alex_merced
View on GitHub
repo with resources from Understanding Data with Alex Merced videos
☆14Jan 20, 2024Updated 2 years ago
mamacker / axieExt
View on GitHub
Freak's Axie Extension
☆11Dec 17, 2021Updated 4 years ago
jamartinh / Orange3-Spark
View on GitHub
A set of widgets for Python's Orange Machine Learning to work with Apache Spark ML
☆15Dec 24, 2016Updated 9 years ago
lavinjj / angularjs-modules-for-great-justice
View on GitHub
Sample code for blog post AngularJS Modules for Great Justice
☆31Apr 18, 2013Updated 13 years ago
dremio-hub / dremio-hbase-connector
View on GitHub
Dremio Community Connector for HBase
☆12Nov 7, 2024Updated last year
mandiberg / MappingWikipedia
View on GitHub
Code that maps Wikipedia contributions by IP address
☆16Oct 2, 2024Updated last year
bufbuild / registry-proto
View on GitHub
BSR's new public API. Currently in development.
☆22Jul 20, 2026Updated last week
zeroc0d3lab / awesome-scalability
View on GitHub
Daily-updated reading list for designing High Scalability , High Availability , High Stability back-end systems - Pull requests are gre…
☆15Jul 14, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
themeteorchef / how-to-build-a-react-native-app-with-meteor
View on GitHub
☆15Aug 21, 2017Updated 8 years ago
whole-tale / all-spark-notebook
View on GitHub
Jupyter Notebook with Spark support extracted from jupyter/docker-stack
☆19Jul 4, 2018Updated 8 years ago
jefftriplett / hubcap
View on GitHub
Hubcap is an autonomous AI agent in 25 lines of code: a small Autobot that you can't trust. *This is the Python fork/port* from https://g…
☆22Nov 10, 2025Updated 8 months ago
bobmshannon / Simple-32bit-ALU-Design
View on GitHub
A simple, working, 32-bit ALU design.
☆14Dec 26, 2014Updated 11 years ago
bgando / object-exercises
View on GitHub
☆11Nov 8, 2017Updated 8 years ago
mark-hoffmann / fastteradata
View on GitHub
Tools for faster and optimized interaction with Teradata and large datasets.
☆17Jul 11, 2018Updated 8 years ago
huseyinbabal / nodeschool-restful-api
View on GitHub
Sample RESTful API for NodeSchool Workshop
☆15Sep 13, 2016Updated 9 years ago
wricardo / grpcurl-mcp
View on GitHub
Model Context Protocol (MCP) server to interact with gRPC services using the grpcurl tool
☆17Mar 5, 2025Updated last year
substrait-io / substrait-validator
View on GitHub
☆15Jul 21, 2026Updated last week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
DhruvKumar / stocks-dashboard-lab
View on GitHub
This lab teaches you how to create a realtime dashboard of stock prices using Hortonworks Data Platform and NiFi
☆23Jan 18, 2016Updated 10 years ago
ryhan / NLP-project
View on GitHub
11411 Natural Language Processing Final Project. Reads wikipedia articles, and then can both answer natural-language questions about the …
☆22Apr 16, 2013Updated 13 years ago
parthxparab / LinkedInJobScript
View on GitHub
JavaScript Script to remove all expired jobs at once
☆12Feb 13, 2022Updated 4 years ago
renardeinside / pyspark-logging-examples
View on GitHub
Writing PySpark logs in Apache Spark and Databricks
☆16Jun 13, 2022Updated 4 years ago
peterroelants / notebooks
View on GitHub
Collection of notebooks
☆17Oct 27, 2024Updated last year
cerndb / sparkMeasure
View on GitHub
This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…
☆16May 21, 2026Updated 2 months ago
ronald-smith-angel / owl-data-sanitizer
View on GitHub
A pyspark lib to validate data quality
☆19Nov 11, 2022Updated 3 years ago