A code-based tutorial for production level data streaming with PySpark plus Optimus for data cleaning, Confluent Kafka, & Apache Drill using Docker and Cassandra (NoSQL DB) for storage; This allows for for fast feature engineering and data cleaning.
☆28Jul 8, 2019Updated 6 years ago
Alternatives and similar repositories for PySpark-Confluent-Kafka-Apache-Drill-
Users that are interested in PySpark-Confluent-Kafka-Apache-Drill- are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Udacity Data Engineer Nano Degree - Project-3 (Data Warehouse)☆22Jun 20, 2019Updated 6 years ago
- ☆19Oct 10, 2020Updated 5 years ago
- Stripe Payment Gateway integration in Django☆10May 24, 2021Updated 4 years ago
- A simple POC app on Django framework☆11Feb 14, 2019Updated 7 years ago
- Collection of Databricks and Jupyter Notebooks☆22Feb 9, 2026Updated last month
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Kafka-connect telegram connector☆16Nov 21, 2025Updated 4 months ago
- Version 1 of Habaneras de Lino is an online ecommerce. This repo contains the backed api of the website using Django and Django Rest Fram…☆14Dec 16, 2022Updated 3 years ago
- MongoDB Change Streams and Kafka Example Application☆14Nov 16, 2017Updated 8 years ago
- Projects from Udacity Data Streaming Nanodegree☆15Aug 14, 2023Updated 2 years ago
- Kaggle Human Protein Atlas Image Classification 73th solution☆19Jan 14, 2019Updated 7 years ago
- 🍴A responsive restaurant theme built with Bootstrap 4☆14Dec 17, 2018Updated 7 years ago
- Simple application implementing Change Data Capture using Kafka Streams.☆19Dec 31, 2019Updated 6 years ago
- Kaggle solutions☆17Nov 22, 2022Updated 3 years ago
- This repo contains a data science project to identify patients at high-risk of Alzheimer's disease.☆12Feb 20, 2021Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Food Ordering Management System PHP & MySQL Project☆12Dec 9, 2019Updated 6 years ago
- Slides from my talk on spaCy IRL, regarding sparse attention.☆12Jul 9, 2019Updated 6 years ago
- A repository to store articles, links, and other resources the club finds helpful☆10Apr 29, 2019Updated 6 years ago
- Automate claim approval in personal insurance sector.☆20Apr 21, 2016Updated 9 years ago
- This repo is for building Docker containers for RStudio, PostgreSQL, Hadoop, Spark, etc.☆22May 12, 2021Updated 4 years ago
- WARNING: This repository is no longer maintained ⚠️ This repository will not be updated.☆12May 31, 2022Updated 3 years ago
- AWS Big Data Certification☆25Jan 10, 2025Updated last year
- 7th place code at NFL Big Data Bowl☆12Jan 8, 2020Updated 6 years ago
- Code repository for Big Data Analytics with R, published by Packt☆28Mar 2, 2026Updated 3 weeks ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Recency, Frequency, and Monetary are three behavioral attributes and are quite simple, in that they can be easily computed for any databa…☆15Nov 20, 2025Updated 4 months ago
- Insurance Claim Prediction using Machine Learning - Udacity Nanodegree Capstone Project☆16Nov 1, 2016Updated 9 years ago
- 🍀 Opinionated LATEX-based Resume Template for Data Science Role 🍀☆12May 23, 2019Updated 6 years ago
- Classifying malignant and benign tumors using Neural Networks 🔬☆18Jun 4, 2021Updated 4 years ago
- R package 2013 google trend☆15Jan 5, 2015Updated 11 years ago
- QuasiModo: Assessing viral genomic analysis methods on HCMV strain mixture☆12Sep 22, 2022Updated 3 years ago
- A collection of my NLP projects☆19Aug 26, 2019Updated 6 years ago
- This is the behavior scorecard, which includes three modules, including data processing, establishment of score card and effect evaluatio…☆19May 21, 2019Updated 6 years ago
- Slideshow template for Voilà based on RevealJS☆16Nov 17, 2021Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆10Feb 14, 2019Updated 7 years ago
- Set up an automated data science environment using Docker☆14Oct 2, 2018Updated 7 years ago
- R-Machine-Learning-Projects☆30Jan 30, 2023Updated 3 years ago
- PySpark, Databrick, h2o, MLlib☆20Aug 25, 2016Updated 9 years ago
- Cognitive Compute aims to present some micro service capabilities as front end to Watson Conversation, Discovery and other bluemix servic…☆11Dec 7, 2018Updated 7 years ago
- The Entire Transcript from the Office in Tidy Format☆26Feb 8, 2023Updated 3 years ago
- Restaurants Menu, Online Menu System - Django + Python☆21Aug 17, 2021Updated 4 years ago