treeverse / awesome-data-engineering
A curated list of data engineering tools for software developers
☆6Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for awesome-data-engineering
- An Operator for scheduling and executing NiFi Flows as Jobs on Kubernetes☆53Updated 4 years ago
- Kafka replicator is a tool used to mirror and backup Kafka topics across regions☆15Updated last year
- Fleet Management Simulator using Consul, Nomad, Vault, Terraform, Packer and Go☆17Updated 4 years ago
- NAT server (cluster) with peers bootstrapped using tailscale☆10Updated 4 years ago
- A Apache Hive SerDe (short for serializer/deserializer) for the Ion file format.☆28Updated 8 months ago
- Apiary provides modules which can be combined to create a federated cloud data lake☆36Updated 7 months ago
- Set of tools for creating backups, compaction and restoration of Apache Kafka® Clusters☆18Updated this week
- Data Catalog is a service for indexing parameterized, strongly-typed data artifacts across revisions. It also powers Flytes memoization s…☆54Updated last year
- Scalable package delivery logistics simulator built using SingleStore and Vectorized Redpanda☆33Updated 8 months ago
- Kubernetes operator providing Ray|Spark|Dask|MPI clusters on-demand☆14Updated last year
- AWS S3 CLI tooklit☆21Updated 4 years ago
- Airbyte is the go-sdk/cdk to help build connectors quickly in go. This package abstracts away much of the "protocol" away from the user a…☆37Updated 8 months ago
- Retrieves cost metrics and core counts from the AWS API and exposes this information via a Prometheus /metrics endpoint.☆13Updated last week
- A starter project to create Arc jobs using the Jupyter Notebook interface☆22Updated 3 years ago
- Python library for CUE https://cuelang.org/☆21Updated 3 years ago
- Using Debezium with WarpStream as a Kafka alternative for CDC☆19Updated 7 months ago
- GKE cluster using Litmus Chaos Engine to validate Zebrium's unsupervised Machine Learning incident detection platform☆17Updated last year
- Cloud Storage Connector integrates Apache Pulsar with cloud storage.☆28Updated this week
- Cluster configuration best practices☆64Updated 3 weeks ago
- Amundsen Gremlin☆20Updated 2 years ago
- Golang based remote data frames access (over gRPC or HTTP stream)☆27Updated last week
- ☆12Updated 5 years ago
- Dashboard for operating Flink jobs and deployments.☆25Updated 3 weeks ago
- Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline☆72Updated last year
- A Cloud Native Query Engine. Serverless, if it fits your case.☆54Updated last year
- Apache Pinot Golang Client managed by StarTree☆28Updated 7 months ago
- Stream your CSV files to an HTTP API☆12Updated 6 years ago
- Hadoop/Hive/Spark container to perform CI tests☆11Updated 3 years ago