Map-reduce, streaming analysis, and external memory algorithms and their implementation using the Hadoop and its eco-system: HBase, Hive, Pig and Spark. The class will include assignment of analyzing large existing databases.
☆34Apr 3, 2017Updated 8 years ago
Alternatives and similar repositories for DSE230_Data_Analysis_Using_Hadoop_and_Spark_UCSD
Users that are interested in DSE230_Data_Analysis_Using_Hadoop_and_Spark_UCSD are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Repo for my graduate data science machine learning class at UCSD (UC San Diego). This course provides a broad introduction to the practic…☆54Mar 26, 2018Updated 8 years ago
- Probability and Statistics Using Python Data Science Masters Course at UCSD (DSE 210)☆182Aug 21, 2017Updated 8 years ago
- Database Management Systems Data Science Masters Course (DSE 201)☆12Jun 26, 2016Updated 9 years ago
- ☆10May 4, 2019Updated 6 years ago
- Coursera machine learning specialization coursework (python based, University of Washington).☆18Mar 28, 2016Updated 10 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆12Apr 27, 2018Updated 7 years ago
- Minimum Entropy is a DDL hosted question/answer site for beginners who need answers to Data Science questions.☆16Jul 11, 2016Updated 9 years ago
- ☆18Aug 15, 2022Updated 3 years ago
- Repo for Coursera.com online course: Statistical Inference☆10Aug 1, 2014Updated 11 years ago
- An example CI/CD pipeline using GitHub Actions for doing continuous deployment of AWS Glue jobs built on PySpark and Jupyter Notebooks.☆13Oct 15, 2020Updated 5 years ago
- Apache Spark Guide☆35Feb 1, 2022Updated 4 years ago
- ☆14Jan 22, 2019Updated 7 years ago
- Coursera Quiz Solutions☆11Aug 11, 2022Updated 3 years ago
- In the Data Science and Engineering program, engineering professionals combine the skills of software programmer, database manager, and s…☆29Nov 4, 2017Updated 8 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Evaluation Metrics for the Hewlett Foundation's Automated Essay Scoring competition☆43Feb 23, 2012Updated 14 years ago
- Data Science Repo and blog for John Hopkins Coursera Courses. Please let me know if you have any questions.☆2,248Mar 8, 2023Updated 3 years ago
- Code relating to the Coursera Bioinformatics Specialization as well as my own genetic algorithm experiment.☆11Apr 19, 2019Updated 6 years ago
- The code and other files related to the Udacity Artificial Intelligence Nanodegree Machine Translation project.☆10Apr 1, 2018Updated 7 years ago
- Deep Learning Part 2, 2019 edition - transcriptions, screenshots and notebooks☆11Jul 19, 2019Updated 6 years ago
- A code sample that allows you to send a payload from the Twitter API to Google Sheets.☆18Mar 23, 2021Updated 5 years ago
- Python scripts to facilitate easy working☆11Updated this week
- Computer Science, Data Science and ML Fundamentals☆11May 30, 2025Updated 9 months ago
- Analyzing Airline data to predict delays☆19May 15, 2014Updated 11 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Pytorch implementation of DeepLOB-ATT and DeepLOB-Seq2Seq from Multi Horizon Forecasting for Limit Order Books☆14Feb 4, 2023Updated 3 years ago
- Repository for sharing the knowledge from the learning path of Kaggle Learning. All contributions welcome :).☆156Feb 1, 2018Updated 8 years ago
- Materials and code relating to Learning Intelligence 25.☆10Mar 23, 2018Updated 8 years ago
- Analytics projects using Big Data eco-systems (Hadoop, Spark, Storm)☆17Dec 27, 2021Updated 4 years ago
- ☆10Jan 23, 2019Updated 7 years ago
- ☆20Aug 20, 2016Updated 9 years ago
- ☆13Mar 31, 2019Updated 6 years ago
- Workshop materials for scraping Twitter with Python☆13May 25, 2016Updated 9 years ago
- This repository is for demonstrating the capability to do SQL-based UPDATES, DELETES, and INSERTS directly in the Data Lake using Amazon …☆18Aug 25, 2021Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Slides, material and solutions of the popular Statistical Learning course from Stanford's own Hastie & Tibshirani. Join me on my journey …☆16Mar 9, 2018Updated 8 years ago
- ☆15Feb 20, 2026Updated last month
- Sends public ip through e-mail. Command-line standalone.☆15Oct 16, 2016Updated 9 years ago
- A tutorial on building a real-time data streaming application pipeline with Apache Kafka🔥🔥🔥☆24Apr 29, 2022Updated 3 years ago
- HackerNews reader☆10Nov 13, 2015Updated 10 years ago
- Examples for the FORM+CODE book☆20Oct 2, 2015Updated 10 years ago
- Language Modelling, CMI vs Perplexity☆11Mar 17, 2018Updated 8 years ago