The repository for the CMU Data Pipeline course. This year's course should use branch 2017
☆40May 2, 2017Updated 8 years ago
Alternatives and similar repositories for data
Users that are interested in data are comparing it to the libraries listed below
Sorting:
- A Real-time Apache log monitor using Kafka & Spark Streaming, with fake log generator.☆24Feb 19, 2020Updated 6 years ago
- A library that will eventually help people wanting to do Data Mining on Twitter☆23Jan 25, 2023Updated 3 years ago
- featselector是一个基于统计分析和模型选择的特征选择器.☆14Mar 4, 2019Updated 7 years ago
- Real-time Machine Learning with Apache Spark on Twitter Public Stream☆68Apr 27, 2017Updated 8 years ago
- Pyspark Spotify ETL☆17Aug 19, 2021Updated 4 years ago
- This data analysis provided information for the March 6th, 2018, NYC Open Data Week event hosted by the Two Sigma Data Clinic, "The State…☆13Jan 9, 2025Updated last year
- python interface to bnlearn and other probabilistic graphical model libraries☆10Mar 26, 2020Updated 5 years ago
- Data pipeline for Sina Weibo Interaction-prediction☆54Oct 21, 2015Updated 10 years ago
- Solutions to the book "Collection of Data Science TakeHome Challenges" in Python.☆10Nov 15, 2017Updated 8 years ago
- ☆12Apr 27, 2018Updated 7 years ago
- Causal Feature Selection Tutorial for AMIA2018☆12Nov 3, 2018Updated 7 years ago
- ☆21Feb 5, 2020Updated 6 years ago
- Material for Machine Learning Meetup "Machine Learning with Scikit-learn"☆29Jan 21, 2016Updated 10 years ago
- Open-source software for tracking and analyzing CarMax vehicle data☆14May 29, 2018Updated 7 years ago
- From Natural Language Text to Graph Database☆31Mar 3, 2016Updated 10 years ago
- Code and data for SciPy 2018 talk on missing data☆21Jun 29, 2018Updated 7 years ago
- A collection of Python scripts☆12Feb 7, 2020Updated 6 years ago
- Simple storage for stock prices with adjusted prices calculation based on Center for Research in Security Prices (CRSP) standards☆12Feb 15, 2018Updated 8 years ago
- A Lightweight Graph Processing Framework for Multi-GPUs☆14Apr 15, 2015Updated 10 years ago
- Interview record☆15Mar 16, 2017Updated 9 years ago
- Companion source code for GTC 2014 talk☆11Mar 25, 2014Updated 11 years ago
- Welcome to my independent research repository!☆17Nov 18, 2016Updated 9 years ago
- ☆26Jul 1, 2021Updated 4 years ago
- Housing loan risk assessment from its origination data☆20Sep 27, 2023Updated 2 years ago
- Classify Traffic Signs.☆10Jan 31, 2017Updated 9 years ago
- Symbolic range analysis for LLVM.☆12Jan 10, 2016Updated 10 years ago
- Simple spatio-temporal windowing in Kafka Streams☆13Jul 14, 2016Updated 9 years ago
- ☆40Sep 3, 2015Updated 10 years ago
- Go Share your TimeSeries/NameSpace/KeyVal DataStore (using leveldb) over HTTP &/or ZeroMQ☆62Oct 28, 2015Updated 10 years ago
- Parallel programs with OpenMPI☆10Apr 1, 2015Updated 10 years ago
- Sourcecode & CAD drawings of NimbRo-OP☆27Oct 30, 2012Updated 13 years ago
- ☆13Jan 16, 2019Updated 7 years ago
- Algorithms from the book "Elements of Statistical Learning", implemented in Python☆12Mar 29, 2015Updated 10 years ago
- The project implemented some machine learning algorithms on spark which is written in scala and it also included standalone implementatio…☆16Jan 3, 2022Updated 4 years ago
- BigDataBench Spark workloads☆11Jul 15, 2016Updated 9 years ago
- a c89 compiler, need total test.☆29Jan 20, 2018Updated 8 years ago
- Jupyter notebook containing code from text preprocessing blog post☆10Nov 29, 2016Updated 9 years ago
- Todo App built with React, TypeScript, View Transition API and Context/Reducer Pattern.☆13Jun 30, 2024Updated last year
- Linux kernel modules for Device File-based I/O Virtualization.☆19Nov 3, 2014Updated 11 years ago