Data and code for "Fast Data Applications with Spark and Python"
☆25Sep 11, 2016Updated 9 years ago
Alternatives and similar repositories for spark-workshop
Users that are interested in spark-workshop are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code & Data for Introduction to Machine Learning with Scikit-Learn☆81Sep 7, 2018Updated 7 years ago
- Minimum Entropy is a DDL hosted question/answer site for beginners who need answers to Data Science questions.☆16Jul 11, 2016Updated 9 years ago
- Code and Notebooks for the Natural Language Processing with Python course.☆64Dec 3, 2017Updated 8 years ago
- Text similarity based on Word2Vec vectors.☆10Feb 7, 2017Updated 9 years ago
- Pine: Machine Learning Prediction As A Service☆18Feb 28, 2017Updated 9 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- High Level Kafka Scanner☆19Sep 29, 2017Updated 8 years ago
- Dirichlet process mixture model (DPMM) for datamicroscopes☆14Oct 9, 2015Updated 10 years ago
- A web application that identifies party in political discourse and an example of operationalized machine learning.☆29Aug 17, 2018Updated 7 years ago
- Legoo: A collection of automation modules to build analytics infrastructure☆20Jul 24, 2020Updated 5 years ago
- Code & data for Fast data processing with Spark V2☆14Feb 1, 2015Updated 11 years ago
- A generator for synthetic streams of financial transactions.☆16Feb 3, 2014Updated 12 years ago
- Fraud Detection Online (Hadoop application)☆18Apr 8, 2014Updated 12 years ago
- Coding exercises for Apache Spark☆103Jun 4, 2015Updated 11 years ago
- Workshop: Python for Data Science☆63Nov 24, 2014Updated 11 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Parallel Genomic Analysis Toolkit☆14Feb 11, 2019Updated 7 years ago
- Tribe extracts a network from an email mbox and writes it to a graphml file for visualization and analysis.☆81Apr 15, 2023Updated 3 years ago
- Amazon access control challenge☆25Jun 21, 2014Updated 12 years ago
- BerkeleyX: CS100.1x, Introduction to Big Data with Apache Spark☆10Jul 27, 2015Updated 10 years ago
- Code for the "Burn CPU, burn" competition at Kaggle. Uses Extreme Learning Machines and hyperopt.☆33Jun 25, 2014Updated 12 years ago
- A simple way to to fetch and convert open datatsets involving Portland, Oregon.☆79May 22, 2016Updated 10 years ago
- AWS, Vagrant, and Spark☆21Nov 10, 2015Updated 10 years ago
- Assignments of CS100.1x, Introduction to Big Data with Apache Spark☆18Jun 29, 2015Updated 11 years ago
- Scripts to Analyze Pronto's Data Release☆22Nov 12, 2015Updated 10 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Language Modeling with Sum-Product Networks☆20Jul 29, 2014Updated 11 years ago
- Code to accompany the paper "k-Stochastic Neighbor Embeddings for Supervised and Unsupervised Learning, ICML 2013".☆26Jun 8, 2016Updated 10 years ago
- ☆41Jul 24, 2015Updated 10 years ago
- Multidimensional data explorer and visualization tool.☆56May 23, 2017Updated 9 years ago
- Solution code from my winning submission to Kaggle's PyCon 2015 competition☆55Apr 9, 2015Updated 11 years ago
- Tabula Rasa Tic-Tac-Toe☆10Jan 3, 2019Updated 7 years ago
- CLI utility to spider websites and extract links to data files☆13Mar 18, 2015Updated 11 years ago
- Elixir Beacon Reference Implementation. Latest release is compliant with v1.1.0 of the specification.☆14Jun 19, 2020Updated 6 years ago
- ☆47May 11, 2016Updated 10 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Experiments with Data Analysis☆19Mar 3, 2014Updated 12 years ago
- A generic interface wrapping multiple backends to provide a consistent pubsub API☆13Oct 31, 2018Updated 7 years ago
- ☆13Oct 5, 2022Updated 3 years ago
- Oracle Data Science Bootcamp 2014☆24Apr 8, 2015Updated 11 years ago
- Solution to the Higgs Boson Machine Learning Challenge on Kaggle☆32Sep 16, 2014Updated 11 years ago
- rddapp: Regression Discontinuity Design Application☆12Sep 2, 2025Updated 10 months ago
- Sync Scroll: A browser extension that synchronizes scrolling across multiple tabs.☆18Updated this week