redapt / pyspark-s3-parquet-exampleView external linksLinks
This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apache Spark instance running on AWS EMR, which will run a SQLContext to create a temporary table using a DataFrame. SQL queries will then be possible against the temporary table.
☆19Jun 23, 2016Updated 9 years ago
Alternatives and similar repositories for pyspark-s3-parquet-example
Users that are interested in pyspark-s3-parquet-example are comparing it to the libraries listed below
Sorting:
- Example of using Airflow to schedule downloading data form S3 and launching spark jobs☆15Oct 17, 2016Updated 9 years ago
- Freddie Mac Single Loan Data Analysis & Machine Learning (Regression / Classification)☆12Jun 11, 2017Updated 8 years ago
- Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.☆29Aug 14, 2023Updated 2 years ago
- exemplar code to download all option chains for a symbol using pyetrade (V1 Etrade API)☆10Sep 28, 2021Updated 4 years ago
- CSS & HTML on Python Easily☆11Sep 23, 2024Updated last year
- The best Python package for comparing two dataframes☆11Dec 29, 2021Updated 4 years ago
- Kubernetes Volume Snapshot Controller using Custom Resource Definition☆12Sep 20, 2017Updated 8 years ago
- Kubernetes Container Storage Interface (CSI) plug-in for Oracle ZFS Storage Appliance.☆14Jul 2, 2024Updated last year
- Python3, NetworkX, Java, MLlib, Spark, Cassandra, Neo4j 3.0, Gephi, Docker☆11Jul 18, 2017Updated 8 years ago
- Python oriented toward data analysis☆13Sep 22, 2025Updated 4 months ago
- Automatically perform exploratory data analysis, and generate a report in Word '.docx' format.☆10Jan 8, 2026Updated last month
- ☆11Jun 12, 2019Updated 6 years ago
- Configuration system geared towards Python ML projects☆11Apr 30, 2023Updated 2 years ago
- This is a demo of a dataframe with editable cells, powered by `streamlit-aggrid` from Pablo Fonseca. You can edit the cells by clicking o…☆44Jun 9, 2023Updated 2 years ago
- AWS S3 plugin for dvc☆13Updated this week
- Kubernetes LDAP authentication service written in Go.☆10May 4, 2019Updated 6 years ago
- The proposed solution shows and approach to unify and centralize logs across different compute platforms like EC2, ECS, EKS and Lambda wi…☆14Oct 17, 2023Updated 2 years ago
- Extension to Python-Markdown to translate pydantic's model fields to markdown table☆12Apr 19, 2024Updated last year
- An app that makes it easy to connect to a user's data warehouse and make a dashboard out of it.☆15Feb 6, 2022Updated 4 years ago
- Trello clone GraphQL Node.js backend☆11Aug 23, 2017Updated 8 years ago
- The Meteor 1.4 For Everyone Tutorial Series Code☆11Sep 17, 2016Updated 9 years ago
- Simple library for working with passwords in Go (golang).☆13Feb 18, 2016Updated 9 years ago
- Snapshot script for Ceph RBD and Samba vfs shadow_copy2☆15Feb 3, 2017Updated 9 years ago
- ☆18Sep 20, 2023Updated 2 years ago
- Ceph Cookbook – Second Edition, published by Packt☆12Jan 14, 2021Updated 5 years ago
- containerized NFS Ganesha daemon☆10Aug 15, 2016Updated 9 years ago
- A proposal for Helm 3 using CRDs and a custom controller☆13Mar 8, 2018Updated 7 years ago
- Integration of Clinical Embeddings with Neural ODEs☆11Jan 6, 2025Updated last year
- Personal expense tracking application☆10Nov 10, 2018Updated 7 years ago
- This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you…☆11Nov 18, 2025Updated 2 months ago
- Run Tensorflow and Keras with GPU support on Kubernetes☆13Mar 21, 2017Updated 8 years ago
- Kafka plugin for KairosDB☆11Feb 28, 2018Updated 7 years ago
- Algorithmic solutions to optimize inference for convolution-based image upsampling. Coded for clarity, not speed.☆10Aug 26, 2022Updated 3 years ago
- A collection of python utility functions☆11Updated this week
- Database of annotated field recording samples that can be used for training audio labelling algorithms☆10Feb 1, 2019Updated 7 years ago
- Generates a tree of an S3 bucket contents☆10Sep 18, 2020Updated 5 years ago
- Privacy-preserving data sandbox for on-premise computation☆11Jun 15, 2021Updated 4 years ago
- Interactive Graphic for Exploring Liver Function Data in Clinical Trials☆11Mar 4, 2023Updated 2 years ago
- Advanced PDF parsing for python☆12Jan 21, 2025Updated last year