jamesbyars / apache-spark-etl-pipeline-exampleView external linksLinks
Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computing.
☆24Aug 11, 2023Updated 2 years ago
Alternatives and similar repositories for apache-spark-etl-pipeline-example
Users that are interested in apache-spark-etl-pipeline-example are comparing it to the libraries listed below
Sorting:
- Example Python and R code for Cloudera Machine Learning (CML) training☆14Dec 1, 2020Updated 5 years ago
- Spark data pipeline that processes movie ratings data.☆31Feb 5, 2026Updated last week
- I'll munch some data here☆12Jun 18, 2021Updated 4 years ago
- Spark on Kubernetes infrastructure Docker images repo☆38Oct 20, 2022Updated 3 years ago
- A Scala library for locality sensitive hashing☆14Aug 1, 2018Updated 7 years ago
- As pessoas públicas enriqueceram entre duas eleições?☆10Dec 8, 2022Updated 3 years ago
- ☆11Mar 27, 2024Updated last year
- A-Frame pipe for interop with Angular☆11Apr 18, 2018Updated 7 years ago
- BigQuery Data Connector for Dremio☆12Sep 29, 2023Updated 2 years ago
- Movie Recommendation System Using Spark ML, Akka and Cassandra☆12Oct 4, 2019Updated 6 years ago
- Sanic application fully integrated with Motor + UMongo☆10Aug 6, 2022Updated 3 years ago
- Assignments for UC San Diego's Hadoop Platform and Application Framework class on Coursera☆10Jan 27, 2016Updated 10 years ago
- A Python + NLTK Text Mining Open Course // Curso aberto se utilizando de Python + NLTK para Mineração Textual☆11Feb 3, 2018Updated 8 years ago
- Fundamentos de Big data com Apache Hadoop☆13Jul 1, 2022Updated 3 years ago
- Code and materials for the talk "Set practice: coding & using sets in Go"☆11Jul 8, 2025Updated 7 months ago
- A Spark datasource for the HadoopCryptoLedger library☆13Sep 29, 2025Updated 4 months ago
- Fetch & sync videos and channels from Youtube API V3☆16Jul 23, 2015Updated 10 years ago
- ☆11Jul 21, 2023Updated 2 years ago
- My notes while learning datascience☆10May 26, 2018Updated 7 years ago
- Sketching data structures for scala, including t-digest☆15Sep 7, 2021Updated 4 years ago
- ☆12Oct 10, 2023Updated 2 years ago
- A bunch of low-level basic methods for data processing and monitoring with Scala Spark☆10Jun 29, 2018Updated 7 years ago
- MOAI, an Open Access Server Platform for Institutional Repositories☆15Apr 21, 2023Updated 2 years ago
- Scripts para realizar o download dos dados abertos do estado da Paraíba disponíveis em https://dados.pb.gov.br/☆10Dec 8, 2022Updated 3 years ago
- Geometrical Face Features Extraction☆16Mar 30, 2013Updated 12 years ago
- Various data stream/batch process demo with Apache Scala Spark 🚀☆11Feb 28, 2020Updated 5 years ago
- Example project on how to do state recovery in Apache Flink using Apache Avro☆12May 7, 2018Updated 7 years ago
- Example repository demonstrating python3.7 in travis-ci☆12Jan 30, 2019Updated 7 years ago
- Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark☆11May 22, 2018Updated 7 years ago
- How to get start with a Machine Learning or a Data Science Project - Exploratory Data Analysis - step by step☆12Oct 7, 2020Updated 5 years ago
- Script generates index.html files for s3 bucket which enables browser experience.☆13Feb 6, 2025Updated last year
- A research project to investigate using GeoTrellis as a REST service☆14Jul 9, 2018Updated 7 years ago
- API Testing using Python☆16Jan 8, 2025Updated last year
- Integração com API da Vindi (Python 3.5+)☆14Oct 2, 2019Updated 6 years ago
- Projects from my Hadoop training sessions☆16Feb 22, 2018Updated 7 years ago
- Notas das aulas da Aceleração Dev #4 da DIO sobre Engenharia de Dados, ministrado pela Everis.☆13Feb 6, 2021Updated 5 years ago
- Apache Hadoop - Docker distribution based on CentOS 7 and Oracle Java 8☆12Feb 20, 2018Updated 7 years ago
- ☆12Nov 13, 2018Updated 7 years ago
- Cloud Spanner Connector for Apache Spark☆17Updated this week