whitfin / s3-concatLinks
Concatenate Amazon S3 files remotely using flexible patterns
☆38Updated 4 years ago
Alternatives and similar repositories for s3-concat
Users that are interested in s3-concat are comparing it to the libraries listed below
Sorting:
- Streaming left joins in Kafka for change data capture☆52Updated last year
- A system to programmatically run data pipelines☆225Updated last month
- Example code for building your own MemSQL Streamliner Pipelines☆23Updated 8 years ago
- A Directed Acyclic Graph task dependency scheduler designed to simplify complex distributed pipelines☆132Updated 7 years ago
- Gather metadata about your S3 buckets☆49Updated 4 years ago
- UNRELEASED. An opinionated framework for analytics-on-write on event streams using key-value storage☆14Updated 10 years ago
- An apporximate frequency counter Redis module☆46Updated 6 years ago
- A schema store service that tracks and manages all the schemas used in the Data Pipeline☆88Updated 4 years ago
- dynamically parse protobuf message then convert to avro☆25Updated 10 years ago
- JSONCDC is now maintained at,☆90Updated 7 years ago
- Cantor provides utilities for estimating the cardinality of large sets.☆84Updated 3 years ago
- JSONs -> JSON Schema☆153Updated 5 years ago
- Convert JSON files to Parquet using PyArrow☆97Updated last year
- ☆19Updated 7 years ago
- Ephemeral Hadoop clusters using Google Compute Platform☆134Updated 3 years ago
- Run templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake☆81Updated 7 months ago
- Twitter Streaming API Example with Kafka Streams in Scala☆49Updated 9 years ago
- CLI tool to launch Spark jobs on AWS EMR☆67Updated 2 years ago
- An interactive CLI tool for managing Kafka topics☆28Updated 6 years ago
- Use SQL to transform your avro schema/records☆28Updated 7 years ago
- Provides a Pythonic interface for reading and writing Avro schemas☆27Updated 3 years ago
- Luigi Plugin for Hubot☆36Updated 9 years ago
- Serverless query engine☆141Updated 2 years ago
- Minipipe: a minimal end-to-end data pipeline☆34Updated 9 years ago
- Paper: A Zero-rename committer for object stores☆20Updated last month
- A toolset to streamline running spark python on EMR☆20Updated 9 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆62Updated last year
- Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline☆76Updated 2 years ago
- Parquet Command-line Tools☆19Updated 9 years ago
- Copy tabular data between databases, CSV files and cloud storage☆239Updated 2 weeks ago