The repository for the CMU Data Pipeline course. This year's course should use branch 2017
☆40May 2, 2017Updated 9 years ago
Alternatives and similar repositories for data
Users that are interested in data are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of Web Log Analysis in Scala and Apache Spark☆10Feb 8, 2015Updated 11 years ago
- A Real-time Apache log monitor using Kafka & Spark Streaming, with fake log generator.☆24Feb 19, 2020Updated 6 years ago
- A library that will eventually help people wanting to do Data Mining on Twitter☆23Jan 25, 2023Updated 3 years ago
- featselector是一个基于统计分析和模型选择的特征选择器.☆14Mar 4, 2019Updated 7 years ago
- Flask-based application using MySQL, MongoDB and Neo4j for storing video data and provides interface to search video and show related vid…☆11Apr 23, 2017Updated 9 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- This library is a wrapper for sklearn and works with data stored using Pandas module.☆17Mar 2, 2016Updated 10 years ago
- Real-time Machine Learning with Apache Spark on Twitter Public Stream☆68Apr 27, 2017Updated 9 years ago
- Complete Pipeline Training at Big Data Scala By the Bay☆71Oct 27, 2015Updated 10 years ago
- Pyspark Spotify ETL☆17Aug 19, 2021Updated 4 years ago
- This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apach…☆19Jun 23, 2016Updated 9 years ago
- This data analysis provided information for the March 6th, 2018, NYC Open Data Week event hosted by the Two Sigma Data Clinic, "The State…☆13Jan 9, 2025Updated last year
- python interface to bnlearn and other probabilistic graphical model libraries☆10Mar 26, 2020Updated 6 years ago
- CEVAE with VampPrior☆11Jul 18, 2018Updated 7 years ago
- Data pipeline for Sina Weibo Interaction-prediction☆54Oct 21, 2015Updated 10 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Solutions to the book "Collection of Data Science TakeHome Challenges" in Python.☆10Nov 15, 2017Updated 8 years ago
- My Data Engineering project @ Insight Data Science☆10Jul 23, 2018Updated 7 years ago
- ☆12Apr 27, 2018Updated 8 years ago
- Causal Feature Selection Tutorial for AMIA2018☆12Nov 3, 2018Updated 7 years ago
- ☆21Feb 5, 2020Updated 6 years ago
- ☆10May 10, 2017Updated 8 years ago
- Freddie Mac Single Loan Data Analysis & Machine Learning (Regression / Classification)☆12Jun 11, 2017Updated 8 years ago
- Nonparametric estimators of the average treatment effect with doubly-robust confidence intervals and hypothesis tests☆20Jan 4, 2023Updated 3 years ago
- ☆13Sep 30, 2018Updated 7 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆15Aug 13, 2024Updated last year
- A web app designed to help Penn students find classes and make schedules☆13Oct 25, 2019Updated 6 years ago
- From Natural Language Text to Graph Database☆31Mar 3, 2016Updated 10 years ago
- HumanJobs is a ChatGPT Plugin that lets ChatGPT create job postings only for humans☆14Apr 15, 2023Updated 3 years ago
- Code and data for SciPy 2018 talk on missing data☆21Jun 29, 2018Updated 7 years ago
- A collection of Python scripts☆12Feb 7, 2020Updated 6 years ago
- Simple storage for stock prices with adjusted prices calculation based on Center for Research in Security Prices (CRSP) standards☆12Feb 15, 2018Updated 8 years ago
- ☆15Jun 25, 2025Updated 10 months ago
- 编译语言实现模式例程☆11Nov 22, 2014Updated 11 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Mirror kept for legacy. Moved to https://github.com/llvm/llvm-project☆17Dec 14, 2016Updated 9 years ago
- ☆13Oct 23, 2018Updated 7 years ago
- Welcome to my independent research repository!☆17Nov 18, 2016Updated 9 years ago
- 新词发现分布式机器学习算法。☆15Jul 21, 2014Updated 11 years ago
- A key/value database based on SkimpyStash.☆13Jun 11, 2015Updated 10 years ago
- Multithreaded HTTP Download Accelerator☆23Jul 27, 2014Updated 11 years ago
- Classify Traffic Signs.☆10Jan 31, 2017Updated 9 years ago