Some exercises to learn Spark. Solved in Python.
☆21Oct 15, 2024Updated last year
Alternatives and similar repositories for spark-exercises
Users that are interested in spark-exercises are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Getting started with Spark, Spark streaming, Spark SQL and DataFrame.☆49May 15, 2018Updated 8 years ago
- Run an open-source data LakeHouse locally using Docker Compose☆12May 31, 2024Updated 2 years ago
- Data Pipeline that utilizes GCP, Python 3.10, Prefect, and more.☆10Jan 23, 2023Updated 3 years ago
- Files for the Docker and Kubernetes on Google Cloud Hands-On labs☆11Mar 14, 2023Updated 3 years ago
- Iot,Big Data Analytics using Apache-kafka,spark and other aws services☆16Sep 11, 2020Updated 5 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Coursera, Big Data Essentials: HDFS, MapReduce and Spark RDD☆12Jun 18, 2019Updated 7 years ago
- An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS ap…☆25Dec 7, 2022Updated 3 years ago
- trino monitoring with JMX metrics through Prometheus and Grafana☆17Aug 14, 2024Updated last year
- ☆19Apr 9, 2020Updated 6 years ago
- Spark-based pipeline to extract and parse monthly games from the Lichess database.☆22Sep 22, 2025Updated 8 months ago
- Apache Spark (PySpark) Practice on Real Data☆270Jan 31, 2020Updated 6 years ago
- Demo showcasing Spark Streaming, Kafka, Kudu - all in Python☆27Jun 12, 2017Updated 9 years ago
- End-to-End deployment of E-commerce customers segmentation using Clustering Machine learning algorithms in Google Cloud Platform and MLOp…☆19Jun 5, 2024Updated 2 years ago
- TorBOX Next Generation — Build a manageable TOR/I2P middle box on any modern Linux.☆29Oct 2, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A forwarding mail server inspired by @alum.mit.edu☆20Mar 22, 2016Updated 10 years ago
- pytest support for airflow☆12Apr 20, 2021Updated 5 years ago
- The Data Pipeline and Analytics Stack is a comprehensive solution designed for processing, storing, and visualizing data. Explore a compl…☆18Dec 26, 2023Updated 2 years ago
- Demo repository for TestKube - a opinionated and friendly Kubernetes testing framework!☆12Oct 24, 2024Updated last year
- Open episode of the data engineering practice course☆32Jul 2, 2024Updated last year
- A tiny wiki engine. (Fossil Export)☆13Jul 29, 2023Updated 2 years ago
- ☆12Oct 9, 2021Updated 4 years ago
- Kustomize base manifests for Thanos☆15May 7, 2026Updated last month
- A set of protocols for remote connection, for two people to connect while apart.☆10Sep 20, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Ai-Thinker A9 GPRS AT Module Related☆36May 9, 2018Updated 8 years ago
- Airflow plugin to create/edit Dags via drag-and-drop on a convenient UI☆16Aug 27, 2023Updated 2 years ago
- blog together with your cyber buds. spatial live pseudonymous multiplayer journaling☆12Jul 28, 2021Updated 4 years ago
- 📕 Writing tests, the DataMade way☆16Sep 24, 2020Updated 5 years ago
- ☆14May 6, 2022Updated 4 years ago
- Tools for debugging memory leaks in R☆13Dec 11, 2023Updated 2 years ago
- Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collab…☆42Apr 21, 2020Updated 6 years ago
- A tool for detecting anomalies in time series data☆11Dec 1, 2022Updated 3 years ago
- Demonstration project for building out a data news rig.☆10Mar 15, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This repository is a voice search demo using OpenAI Whisper, DuckDB, and the Metaphone algorithm. The associate blog post is here: https:…☆13May 15, 2024Updated 2 years ago
- This is BetaNYC. Here, you can comment on who we are.☆11Dec 1, 2020Updated 5 years ago
- visualize an AST serialized as YAML☆13Mar 13, 2023Updated 3 years ago
- ⏰ Fetch and clean data on a schedule, using GitHub Actions + R☆10Aug 30, 2022Updated 3 years ago
- ☆15Aug 4, 2025Updated 10 months ago
- chrome extension that automatically saves liked / bookmarked tweets to Are.na☆16May 27, 2023Updated 3 years ago
- Generate diff comment between two directories in GitHub Actions☆21Updated this week