whitfin / s3-concatLinks
Concatenate Amazon S3 files remotely using flexible patterns
☆38Updated 4 years ago
Alternatives and similar repositories for s3-concat
Users that are interested in s3-concat are comparing it to the libraries listed below
Sorting:
- An HFile-backed Key-Value Server☆43Updated 6 years ago
- Compare eventual consistency of object stores☆176Updated last year
- Gather metadata about your S3 buckets☆49Updated 4 years ago
- UNRELEASED. An opinionated framework for analytics-on-write on event streams using key-value storage☆14Updated 10 years ago
- Timberlake is a Job Tracker for Hadoop.☆177Updated 5 years ago
- Streaming left joins in Kafka for change data capture☆52Updated last year
- A key/value store for serving static batch data☆174Updated 2 years ago
- Cantor provides utilities for estimating the cardinality of large sets.☆84Updated 3 years ago
- ☆20Updated 4 years ago
- Example code for building your own MemSQL Streamliner Pipelines☆23Updated 8 years ago
- Luigi Plugin for Hubot☆36Updated 9 years ago
- dynamically parse protobuf message then convert to avro☆25Updated 10 years ago
- Tools for working with parquet, impala, and hive☆134Updated 4 years ago
- An analyzer for getting metrics about the contents of a Apache Kafka topic☆64Updated 4 years ago
- Run templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake☆81Updated 6 months ago
- Use SQL to transform your avro schema/records☆28Updated 7 years ago
- A Kafka-Connect Sink for S3 with no Hadoop dependencies.☆57Updated 2 years ago
- JSONCDC is now maintained at,☆90Updated 7 years ago
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆47Updated 5 years ago
- Herd-MDL, a turnkey managed data lake in the cloud. See https://finraos.github.io/herd-mdl/ for more information.☆15Updated last year
- A system to programmatically run data pipelines☆224Updated last week
- Library and worker to handle transfer of data in s3 into redshift. Includes table creation and manipulation, as well as time-based insert…☆60Updated 2 years ago
- RDB parsing & dumping library and utility☆64Updated 7 months ago
- Python bindings for TrailDB☆38Updated 5 years ago
- Parquet Command-line Tools☆19Updated 9 years ago
- HyperMinHash: Bringing intersections to HyperLogLog☆306Updated 7 years ago
- An interactive CLI tool for managing Kafka topics☆28Updated 6 years ago
- Ephemeral Hadoop clusters using Google Compute Platform☆134Updated 3 years ago
- logq - Analyzing log files in PartiQL with command-line toolkit, implemented in Rust☆46Updated 2 years ago
- A Directed Acyclic Graph task dependency scheduler designed to simplify complex distributed pipelines☆131Updated 7 years ago