whitfin / s3-concatLinks
Concatenate Amazon S3 files remotely using flexible patterns
☆38Updated 4 years ago
Alternatives and similar repositories for s3-concat
Users that are interested in s3-concat are comparing it to the libraries listed below
Sorting:
- Utilities and tools based around Amazon S3 to provide convenience APIs in a CLI☆56Updated 4 years ago
- Streaming left joins in Kafka for change data capture☆52Updated last year
- Gather metadata about your S3 buckets☆49Updated 4 years ago
- Cantor provides utilities for estimating the cardinality of large sets.☆83Updated 3 years ago
- ☆20Updated 4 years ago
- An HFile-backed Key-Value Server☆42Updated 6 years ago
- Tools for working with parquet, impala, and hive☆134Updated 4 years ago
- UNRELEASED. An opinionated framework for analytics-on-write on event streams using key-value storage☆14Updated 9 years ago
- Run templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake☆81Updated 3 months ago
- logq - Analyzing log files in PartiQL with command-line toolkit, implemented in Rust☆46Updated 2 years ago
- A Directed Acyclic Graph task dependency scheduler designed to simplify complex distributed pipelines☆131Updated 7 years ago
- Timberlake is a Job Tracker for Hadoop.☆177Updated 5 years ago
- Provides a Pythonic interface for reading and writing Avro schemas☆27Updated 3 years ago
- t-digest module for Redis☆73Updated 4 years ago
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆47Updated 5 years ago
- This is an introduction of Apache Spark DataFrames.☆41Updated 10 years ago
- Compare eventual consistency of object stores☆174Updated last year
- Fast JSON to Avro converter☆61Updated 6 years ago
- An apporximate frequency counter Redis module☆46Updated 6 years ago
- Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Stor…☆41Updated 2 years ago
- A straightforward CLI Kafka producer☆20Updated 8 years ago
- Compile JSON Schema into Avro and BigQuery schemas☆44Updated last year
- A system to programmatically run data pipelines☆222Updated 3 months ago
- Convert JSON files to Parquet using PyArrow☆97Updated last year
- dynamically parse protobuf message then convert to avro☆25Updated 10 years ago
- A Kafka-Connect Sink for S3 with no Hadoop dependencies.☆57Updated 2 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆61Updated 11 months ago
- A schema store service that tracks and manages all the schemas used in the Data Pipeline☆87Updated 4 years ago
- A tool to run benchmarks on Kafka clusters☆100Updated 2 years ago
- Simple Scalable Time Series Database☆131Updated 2 years ago