Netflix / pygenie
☆72Updated 3 weeks ago
Alternatives and similar repositories for pygenie:
Users that are interested in pygenie are comparing it to the libraries listed below
- A toolkit providing a uniform interface for connecting to and extracting data from a wide variety of (potentially remote) data stores (in…☆255Updated 10 months ago
- transformpy is a Python 2/3 module for doing transforms on "streams" of data☆29Updated 7 years ago
- SQLAlchemy dialect for Turbodbc☆23Updated 10 months ago
- A wrapper for libhdfs3 to interact with HDFS from Python☆136Updated 4 years ago
- A Cookiecutter template for creating Faust projects quickly.☆70Updated 2 years ago
- ETLy is an add-on dashboard service on top of Apache Airflow.☆69Updated last year
- Apache (Py)Spark type annotations (stub files).☆116Updated 2 years ago
- Fast iterative local development and testing of Apache Airflow workflows☆197Updated 3 months ago
- REST-like API exposing Airflow data and operations☆61Updated 6 years ago
- Fork of aio-libs/aiokafka☆27Updated last year
- Thin-client metrics library for use with Atlas and SpectatorD☆47Updated last week
- Deploy dask on YARN clusters☆69Updated 7 months ago
- Data Brewery is an ETL (Extract-Transform-Load) program that connect to many data sources (cloud services, databases, ...) and manage dat…☆16Updated 4 years ago
- Airflow plugin to transfer arbitrary files between operators☆78Updated 6 years ago
- Airflow workflow management platform chef cookbook.☆71Updated 5 years ago
- Builds Airflow DAGs from configuration files. Powers all DAGs on the Etsy Data Platform☆261Updated last year
- ☆54Updated 6 years ago
- locopy: Loading/Unloading to Redshift and Snowflake using Python.☆107Updated this week
- Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead☆52Updated 6 years ago
- Python client for Spark Jobserver Rest API☆39Updated 5 years ago
- A Python client for Apache Livy, enabling use of remote Apache Spark clusters.☆70Updated 3 years ago
- Serializes data into a JSON format using AVRO schema.☆137Updated 3 years ago
- IP Address dtype and block for pandas☆105Updated last year
- Dockerized setup for testing code on realistic hadoop clusters☆27Updated 4 years ago
- Code Repository for the EVO-ODAS☆31Updated 7 years ago
- As a believer of learning through examples, I have decided to put my own examples of Gremlin queries inside Jupyter Notebooks for people …☆32Updated 5 years ago
- A Getting Started Guide for developing and using Airflow Plugins☆93Updated 6 years ago
- Asynchronous actions for PySpark☆47Updated 3 years ago
- Python Driver for Apache Drill.☆59Updated 2 years ago
- Unit and integration testing with PySpark can be tough to figure out, let's make that easier.☆22Updated 9 years ago