redsymbol / csv2parquet
Create Parquet files from CSV
☆67Updated 7 years ago
Related projects: ⓘ
- Apache Spark AWS Lambda Executor (SAMBA)☆44Updated 6 years ago
- Export Redshift data and convert to Parquet for use with Redshift Spectrum or other data warehouses.☆116Updated last year
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆46Updated 4 years ago
- Example for an airflow plugin☆49Updated 8 years ago
- Airflow workflow management platform chef cookbook.☆67Updated 5 years ago
- The open source version of the Amazon Athena documentation. To submit feedback & requests for changes, submit issues in this repository, …☆86Updated last year
- Redshift Ops Console☆93Updated 8 years ago
- Herd is a managed data lake for the cloud. The Herd unified data catalog helps separate storage from compute in the cloud. Manage petabyt…☆135Updated last year
- PyAthenaJDBC is an Amazon Athena JDBC driver wrapper for the Python DB API 2.0 (PEP 249).☆95Updated last year
- Scripts and instructions to facilitate running Deep Learning Tasks on Amazon EMR☆62Updated 10 months ago
- A curated list of all the awesome examples, articles, tutorials and videos for Apache Airflow.☆96Updated 3 years ago
- Test suite to document the behavior of Spark☆21Updated 3 years ago
- A collection of airflow sample workflows for data processing on aws☆12Updated 6 years ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆49Updated 8 months ago
- Apache Spark on AWS Lambda☆151Updated last year
- ☆37Updated this week
- ☆20Updated this week
- ☆47Updated this week
- A luigi powered analytics / warehouse stack☆87Updated 7 years ago
- Spark-cloud is a set of scripts for starting spark clusters on ec2☆12Updated 8 years ago
- DataPipeline for humans.☆252Updated 2 years ago
- ☆56Updated this week
- ☆19Updated this week
- A prototype of Hive UDFs/UDTFs that execute nested SQL queries within rows.☆54Updated 9 years ago
- Infrastructure code to run notebooks on some EC2 nodes☆10Updated 6 years ago
- Autoscaling EMR clusters and Kinesis streams on Amazon Web Services (AWS)☆47Updated 9 months ago
- Airflow plugin to transfer arbitrary files between operators☆78Updated 5 years ago
- CLI tool to launch Spark jobs on AWS EMR☆67Updated 11 months ago
- Luigi Plugin for Hubot☆35Updated 8 years ago
- Scheduled task execution on top of AWS Data Pipeline☆43Updated 9 years ago