Repository for Apache Spark course at Team Data Science
☆17Oct 23, 2020Updated 5 years ago
Alternatives and similar repositories for learning-apache-spark
Users that are interested in learning-apache-spark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Course Material Data Engineering on AWS Course☆31Sep 9, 2024Updated last year
- Build a Content-Based Movie Recommender System (TF-IDF, BM25, BERT)☆14Jun 13, 2022Updated 3 years ago
- Sample Project to Learn Data Engineering☆10Aug 1, 2021Updated 4 years ago
- Dockerizing and Consuming an Apache Livy environment☆13Jun 29, 2022Updated 3 years ago
- A parallel implementation of the bzip2 data compressor in python, this data compression pipeline is using algorithms like Burrows–Wheeler…☆13Jun 29, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A blazing fast way to insert GeoJSON, ShapeFile & OsmPBF into a PostGIS database.☆12Sep 11, 2024Updated last year
- Dockerizing a Python Script for Web Scraping and consume the scraped data using FastApi (www.metroscubicos.com)☆15Dec 16, 2021Updated 4 years ago
- Workshop for 2020 Apache Beam Summit: using Beam to build data pipelines for deep learning.☆11Aug 24, 2020Updated 5 years ago
- Data sets and ML models versioning example from DVC get started☆10Jun 4, 2024Updated last year
- This app collects data from OSM(open street maps). You can change queries according to your need and use it for data extraction.☆13Apr 29, 2020Updated 6 years ago
- A Python package for interactive mapping☆20May 8, 2021Updated 4 years ago
- Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag☆23Sep 19, 2022Updated 3 years ago
- Code Repository for GCP: Complete Google Data Engineer and Cloud Architect Guide(v), Published by Packt☆16Jan 30, 2023Updated 3 years ago
- The goal of this project is to identify students at risk of dropping out the school☆22May 7, 2021Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Labs and demos for courses in the Data Engineer track of GCP Training (http://cloud.google.com/training).☆16Oct 28, 2019Updated 6 years ago
- ☆18Jul 24, 2019Updated 6 years ago
- This is a simple Python library for interacting with the REST interface of an instance of Cordra☆10May 20, 2022Updated 3 years ago
- Simulator for cellular automata defined on regular lattices on Minkovski plane☆11Apr 8, 2026Updated 3 weeks ago
- Spark ML with pyspark☆71Feb 10, 2023Updated 3 years ago
- my web gis maps☆23Dec 18, 2023Updated 2 years ago
- ☆31Dec 26, 2025Updated 4 months ago
- A self-contained, queryable knowledge graph of tech skills and IT stuff; maintained with git☆18Nov 14, 2023Updated 2 years ago
- Pipeline for processing JWST imaging data, tailored for nearby galaxies. Built for PHANGS☆21Apr 14, 2026Updated 2 weeks ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆12Apr 21, 2021Updated 5 years ago
- Curated list of industry data science blogs☆12Dec 21, 2016Updated 9 years ago
- This repository contains the data and the code associated to the paper "Hyper-cores promote localization and efficient seeding in higher-…☆12Oct 6, 2023Updated 2 years ago
- Implementing RAG with Amazon Bedrock, Amazon Titan, and Amazon OpenSearch Serverless☆11Oct 9, 2023Updated 2 years ago
- Visually query Spanner Graph data in notebooks☆40Apr 16, 2026Updated 2 weeks ago
- 🔌 Flask S3Viewer is a powerful extension that makes it easy to browse S3 in any Flask application. (Python S3 Uploader / Flask S3 Upload…☆13Jan 8, 2025Updated last year
- Instruction tuning dataset generation inspired by LLaVA-Instruct-158k via any LLM, also for commercial use.☆13Mar 13, 2024Updated 2 years ago
- Tutorial for building a POC Kafka + Spark + Cassandra pipeline using Scala☆32Apr 13, 2020Updated 6 years ago
- Automating Your Data Pipeline with Apache Airflow☆40Sep 1, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Simple Android App showing movement (delta frames) in images using Camera2 API using RenderScript.☆24Nov 13, 2019Updated 6 years ago
- This is a repository for the LinkedIn Learning course Practical Python for Data Professionals☆49Jun 12, 2024Updated last year
- Engineer streaming processing data pipeline on Azure with the main purpose to ingest and process tweets and satellite images data from Hu…☆23Apr 8, 2021Updated 5 years ago
- This is a NBD server for OpenStack Object Storage (Swift)☆31Mar 31, 2016Updated 10 years ago
- Processing source code for an animation☆10Jan 28, 2022Updated 4 years ago
- notebooks of cool EBM visualizations☆15Feb 12, 2021Updated 5 years ago
- Plex TheMovie Database Agent, with ID support☆12Jul 30, 2020Updated 5 years ago