The repository for the CMU Data Pipeline course. This year's course should use branch 2017
☆40May 2, 2017Updated 9 years ago
Alternatives and similar repositories for data
Users that are interested in data are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Python Scripts for data pre- and post-processing (parsing, cleaning and analysis)☆13Jun 9, 2011Updated 15 years ago
- R-implementation of a Markov-Modulated Poisson Process for unsupervised event detection.☆15Dec 26, 2015Updated 10 years ago
- Counting Twitter hashtags using Spark Streaming and Cassandra☆41Feb 16, 2015Updated 11 years ago
- Fake HTTP log generator module, test if your monitor system can survive under the log spikes.☆37Apr 13, 2026Updated 2 months ago
- Twitter hashtag tracking, analysis and classification.☆38Feb 19, 2020Updated 6 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- featselector是一个基于统计分析和模型选择的特征选择器.☆14Mar 4, 2019Updated 7 years ago
- Flask-based application using MySQL, MongoDB and Neo4j for storing video data and provides interface to search video and show related vid…☆11Apr 23, 2017Updated 9 years ago
- This library is a wrapper for sklearn and works with data stored using Pandas module.☆17Mar 2, 2016Updated 10 years ago
- tutorials and samples that show you how get the most out of IBM Analytics for Apache Spark☆78Mar 16, 2018Updated 8 years ago
- Real-time Machine Learning with Apache Spark on Twitter Public Stream☆68Apr 27, 2017Updated 9 years ago
- Pyspark Spotify ETL☆17Aug 19, 2021Updated 4 years ago
- Genetic Algorithm Feature Engineering☆15Oct 3, 2017Updated 8 years ago
- This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apach…☆19Jun 23, 2016Updated 9 years ago
- python interface to bnlearn and other probabilistic graphical model libraries☆10Mar 26, 2020Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- We use policy gradient to help agents learn optimal policies in a competitive multi-agent contextual bandit setting☆12Mar 9, 2018Updated 8 years ago
- CEVAE with VampPrior☆11Jul 18, 2018Updated 7 years ago
- Solutions to the book "Collection of Data Science TakeHome Challenges" in Python.☆10Nov 15, 2017Updated 8 years ago
- My Data Engineering project @ Insight Data Science☆10Jul 23, 2018Updated 7 years ago
- Nonparametric estimators of the average treatment effect with doubly-robust confidence intervals and hypothesis tests☆20Jan 4, 2023Updated 3 years ago
- ☆13Sep 30, 2018Updated 7 years ago
- From Natural Language Text to Graph Database☆31Mar 3, 2016Updated 10 years ago
- github upload file☆16Sep 20, 2016Updated 9 years ago
- Code and data for SciPy 2018 talk on missing data☆21Jun 29, 2018Updated 7 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- My talk at Strata 2014 in Santa Clara, CA☆73Feb 18, 2014Updated 12 years ago
- A Lightweight Graph Processing Framework for Multi-GPUs☆14Apr 15, 2015Updated 11 years ago
- Interview record☆15Mar 16, 2017Updated 9 years ago
- Companion source code for GTC 2014 talk☆11Mar 25, 2014Updated 12 years ago
- ☆13Oct 23, 2018Updated 7 years ago
- Welcome to my independent research repository!☆17Nov 18, 2016Updated 9 years ago
- Doing research on top of Jalangi☆12Sep 9, 2016Updated 9 years ago
- 新词发现分布式机器学习算法。☆15Jul 21, 2014Updated 11 years ago
- Multithreaded HTTP Download Accelerator☆23Jul 27, 2014Updated 11 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Housing loan risk assessment from its origination data☆21Sep 27, 2023Updated 2 years ago
- Classify Traffic Signs.☆10Jan 31, 2017Updated 9 years ago
- ☆13May 20, 2020Updated 6 years ago
- Simple spatio-temporal windowing in Kafka Streams☆13Jul 14, 2016Updated 9 years ago
- This is a repository created by Lei Huang to record Leetcode SQL practice.☆17Jun 27, 2020Updated 5 years ago
- Go Share your TimeSeries/NameSpace/KeyVal DataStore (using leveldb) over HTTP &/or ZeroMQ☆62Oct 28, 2015Updated 10 years ago
- ChatGPT Chrome Extension using Reactjs and TailwindCSS☆11Jan 5, 2023Updated 3 years ago