Map-reduce, streaming analysis, and external memory algorithms and their implementation using the Hadoop and its eco-system: HBase, Hive, Pig and Spark. The class will include assignment of analyzing large existing databases.
☆34Apr 3, 2017Updated 9 years ago
Alternatives and similar repositories for DSE230_Data_Analysis_Using_Hadoop_and_Spark_UCSD
Users that are interested in DSE230_Data_Analysis_Using_Hadoop_and_Spark_UCSD are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Homework/Classwork for my DSE 200 Python for Data Analysis Class at UC San Diego (UCSD)☆102Aug 4, 2016Updated 9 years ago
- This is a general purpose wrapper for converting Datalog queries to Neo4J graph database☆10Dec 9, 2016Updated 9 years ago
- Installations for Data Science. Anaconda, RStudio, Spark, TensorFlow, AWS (Amazon Web Services).☆236Mar 8, 2023Updated 3 years ago
- ☆10May 4, 2019Updated 7 years ago
- This is the official repository for the paper "Words That Unite The World: A Unified Framework for Deciphering Global Central Bank Commun…☆20Oct 19, 2025Updated 6 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Python tutorials and puzzles to share with the world!☆170Sep 28, 2017Updated 8 years ago
- Minimum Entropy is a DDL hosted question/answer site for beginners who need answers to Data Science questions.☆16Jul 11, 2016Updated 9 years ago
- ☆24Aug 6, 2021Updated 4 years ago
- Source code for 'Up and Running with DAX for Power BI' by Alison Box☆12Jun 10, 2022Updated 3 years ago
- The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technolog…☆13Jun 26, 2022Updated 3 years ago
- Convex Formulation of Multiple Instance Learning from Positive and Unlabeled Bags☆10Apr 28, 2018Updated 8 years ago
- ☆18Aug 15, 2022Updated 3 years ago
- Material for UW Extension Data Science 350☆19Dec 31, 2017Updated 8 years ago
- Repo for Coursera.com online course: Statistical Inference☆10Aug 1, 2014Updated 11 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- An example CI/CD pipeline using GitHub Actions for doing continuous deployment of AWS Glue jobs built on PySpark and Jupyter Notebooks.☆13Oct 15, 2020Updated 5 years ago
- Extract, transform, and load data for analytic processing using AWS Glue☆17May 2, 2021Updated 5 years ago
- At the time of exams most of the time student share their notes via social media and after the exam gets over it become really difficut t…☆14May 29, 2018Updated 7 years ago
- Apache Spark Guide☆37Feb 1, 2022Updated 4 years ago
- Road extraction with deep learning from high resolution satellite images.☆13Sep 16, 2021Updated 4 years ago
- ☆18Nov 16, 2018Updated 7 years ago
- In the Data Science and Engineering program, engineering professionals combine the skills of software programmer, database manager, and s…☆29Nov 4, 2017Updated 8 years ago
- Houses price prediction web app☆11Feb 20, 2026Updated 2 months ago
- Building pipeline to process the real-time data using Spark and Mongodb.☆12Oct 30, 2019Updated 6 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Currency Portfolio Optimization - IPython notebook and data☆26Dec 21, 2015Updated 10 years ago
- Public GitHub repo for SciPy 2022 tutorial (Introduction to Numerical Computing With NumPy)☆13Aug 24, 2022Updated 3 years ago
- Code relating to the Coursera Bioinformatics Specialization as well as my own genetic algorithm experiment.☆11Apr 19, 2019Updated 7 years ago
- The code and other files related to the Udacity Artificial Intelligence Nanodegree Machine Translation project.☆10Apr 1, 2018Updated 8 years ago
- A code sample that allows you to send a payload from the Twitter API to Google Sheets.☆18Mar 23, 2021Updated 5 years ago
- Python scripts to facilitate easy working☆11Mar 23, 2026Updated last month
- Analyzing Airline data to predict delays☆19May 15, 2014Updated 11 years ago
- Group project for the WorldQuant University module, risk management.☆13Feb 3, 2019Updated 7 years ago
- Pytorch implementation of DeepLOB-ATT and DeepLOB-Seq2Seq from Multi Horizon Forecasting for Limit Order Books☆14Feb 4, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A Spark Reliability Testing Suite☆13Jan 10, 2017Updated 9 years ago
- Repository for sharing the knowledge from the learning path of Kaggle Learning. All contributions welcome :).☆156Feb 1, 2018Updated 8 years ago
- Python tutorials in both Jupyter Notebook and youtube format.☆1,252Apr 17, 2026Updated 3 weeks ago
- Materials and code relating to Learning Intelligence 25.☆11Mar 23, 2018Updated 8 years ago
- ☆17May 16, 2020Updated 5 years ago
- ☆10Jan 23, 2019Updated 7 years ago
- ☆20Aug 20, 2016Updated 9 years ago