lucrussell / docker-luigi
A data engineering pipeline for harvesting top author data from Medium
☆16Updated 6 years ago
Alternatives and similar repositories for docker-luigi:
Users that are interested in docker-luigi are comparing it to the libraries listed below
- Fast, resilient and reproducible data analysis with cached SQL queries☆30Updated last year
- A package for data science practitioners. This library implements a number of helpful, common data transformations with a scikit-learn fr…☆57Updated 3 years ago
- ☆16Updated 4 years ago
- Data exploration library with a pandas-like API☆74Updated 4 years ago
- Python DataFrame with fast insert and appends☆75Updated last year
- Extend pandas to_sql function to perform multi-threaded, concurrent "insert or update" command in memory☆84Updated 10 months ago
- Automated Machine Learning: go from 'X' to 'y' without effort.☆46Updated 5 years ago
- Utilities for creating ETL pipelines with mara☆36Updated 2 years ago
- Slack notifications for the Luigi workflow manager☆46Updated 3 years ago
- Just a boilerplate for PySpark and Flask☆35Updated 6 years ago
- Some wrappers around python modules for simplifying the data exploration process.☆13Updated 2 months ago
- Simple, light-weight data frames for Python☆25Updated last week
- Send summary messages of your Luigi jobs to Slack☆46Updated 5 years ago
- Materials for the Princeton Quant Trading Conference Chicago 2018 workshop☆13Updated 6 years ago
- Marshmallow Schema generator for Pandas DataFrames☆24Updated 4 years ago
- A xlsx and html rendering library for rendering data available in Pandas DataFrames.☆26Updated 8 months ago
- Set-oriented Operations in Pandas☆24Updated 4 years ago
- Slides produced by Engineers and Data Scientists of Blue Yonder☆50Updated 5 years ago
- T4 is now in production as Quilt 3☆64Updated 5 years ago
- A luigi powered analytics / warehouse stack☆87Updated 7 years ago
- Automated Exploratory Data Analysis. Simplifying Data Exploration☆34Updated 4 years ago
- ETLy is an add-on dashboard service on top of Apache Airflow.☆69Updated last year
- An easy-to-use Python wrapper for the Don Best Sports Data API.☆16Updated 2 years ago
- Server that simplifies connecting pandas to a realtime data feed, testing hypothesis and visualizing results in a web browser☆33Updated last year
- Code repository supporting the medium blog☆13Updated 4 years ago
- Fuzzy joins for python pandas - easily join different datasets☆59Updated 4 years ago
- A basic introduction to machine learning (one day training).☆16Updated 7 years ago
- This is all my random garbage.☆26Updated last year
- Airflow plugin to transfer arbitrary files between operators☆78Updated 6 years ago
- CLI tool to launch Spark jobs on AWS EMR☆67Updated last year