This repository focuses on gathering and making a curated list resources to learn Hadoop for FREE.
☆57Jun 10, 2018Updated 8 years ago
Alternatives and similar repositories for Learn-Hadoop-and-Spark
Users that are interested in Learn-Hadoop-and-Spark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- POC for all the stack of big data (kafka, spark, cassandra, hdfs, docker, springboot)☆12Dec 16, 2022Updated 3 years ago
- Extract, transform, and load data for analytic processing using AWS Glue☆17May 2, 2021Updated 5 years ago
- A consumer of a Kafka topic based on Flink☆12Oct 5, 2022Updated 3 years ago
- A tour of Spring dependency injection styles☆62May 31, 2012Updated 14 years ago
- Full-Text Search System to stream, collect, clean, store and filter data collected from different sources using Docker, Kafka, Elasticsea…☆21Jan 7, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 【易车】- Spark、flink、HBase、Hive、flume集成了一些Hadoop的原生api的一些demo(如HDFS、MapReduce:目前就这两个);同时测试一些异常功能☆16Apr 4, 2019Updated 7 years ago
- Collection of Pig scripts that I use for my talks and workshops☆39Apr 30, 2013Updated 13 years ago
- Educational notes,Hands on problems w/ solutions for hadoop ecosystem☆87Jan 22, 2019Updated 7 years ago
- The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technolog…☆13Jun 26, 2022Updated 4 years ago
- Big Data Resources and References☆13Sep 4, 2024Updated last year
- Learning PySpark video series☆11Mar 5, 2018Updated 8 years ago
- ☆18Aug 15, 2022Updated 3 years ago
- Apache Spark using SQL☆14Aug 18, 2021Updated 4 years ago
- Listing my favorite research papers 📝 from different fields as I read them.☆10Oct 17, 2019Updated 6 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- An example CI/CD pipeline using GitHub Actions for doing continuous deployment of AWS Glue jobs built on PySpark and Jupyter Notebooks.☆13Oct 15, 2020Updated 5 years ago
- This repo consists of all important concepts for data engineers.☆11Jun 2, 2026Updated 3 weeks ago
- A Hubot script for creating quick reminders through natural language.☆11Jun 29, 2017Updated 9 years ago
- Self-contained examples using Apache Spark with the functional features of Java 8☆64Apr 8, 2018Updated 8 years ago
- Material for my session at Indian Institute of Science, Bangalore 2019 for humans.☆12Aug 24, 2019Updated 6 years ago
- This repository is to help with the Partner Demonstration of the Apache Atlas project.☆30Oct 29, 2015Updated 10 years ago
- Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and streaming systems.…☆11Jul 29, 2017Updated 8 years ago
- Apche Spark Structured Streaming with Kafka using Python(PySpark)☆40May 16, 2019Updated 7 years ago
- Social Media Analysis, scalable solution, flexible deployment that analyses social media contents☆10Jul 20, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Ambari service for RedHat FreeIPA☆11Sep 30, 2016Updated 9 years ago
- List of playbooks to manage Ambari☆13Oct 3, 2018Updated 7 years ago
- Run dynamic SQL in SQL. This package allows queries with an unknown number of select-list items and can solve challenging problems like d…☆12Oct 5, 2024Updated last year
- A Python script to swoop and decrypt passwords from Chrome's local storage.☆11Dec 10, 2018Updated 7 years ago
- ☆18Nov 16, 2018Updated 7 years ago
- Building pipeline to process the real-time data using Spark and Mongodb.☆12Oct 30, 2019Updated 6 years ago
- Public GitHub repo for SciPy 2022 tutorial (Introduction to Numerical Computing With NumPy)☆13Aug 24, 2022Updated 3 years ago
- Avro Schema Shredder is a REST API that enables storage of Avro Schemas in Apache Atlas. This API enables an organization to use Apache A…☆13Jan 11, 2017Updated 9 years ago
- Python scripts for Agisoft Photoscan☆12Jun 18, 2015Updated 11 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Docker Image packaging for Pentaho BI Server☆10Jul 6, 2015Updated 10 years ago
- Dump the saved wifi passwords for windows using regular expressions and python 3☆17Dec 22, 2016Updated 9 years ago
- Packer Template to build a AWS Apache Cassandra AMI☆10Jan 3, 2022Updated 4 years ago
- A code sample that allows you to send a payload from the Twitter API to Google Sheets.☆17Mar 23, 2021Updated 5 years ago
- ☆20Feb 24, 2020Updated 6 years ago
- Various data stream/batch process demo with Apache Scala Spark 🚀☆12Feb 28, 2020Updated 6 years ago
- Adds a framework to enable Natural Language interactions in your Hubot scripts☆11Dec 6, 2016Updated 9 years ago