treeverse / awesome-data-engineering
A curated list of data engineering tools for software developers
☆6Updated 4 years ago
Alternatives and similar repositories for awesome-data-engineering:
Users that are interested in awesome-data-engineering are comparing it to the libraries listed below
- NAT server (cluster) with peers bootstrapped using tailscale☆10Updated 4 years ago
- An Operator for scheduling and executing NiFi Flows as Jobs on Kubernetes☆53Updated 4 years ago
- Apache Pinot Golang Client managed by StarTree☆28Updated 9 months ago
- Fleet Management Simulator using Consul, Nomad, Vault, Terraform, Packer and Go☆17Updated 4 years ago
- Apiary provides modules which can be combined to create a federated cloud data lake☆36Updated 9 months ago
- An engine for fast time series data aggregation☆12Updated 5 years ago
- Kafka replicator is a tool used to mirror and backup Kafka topics across regions☆15Updated last year
- Explore Apache Kafka data pipelines in Kubernetes.☆45Updated 2 months ago
- Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline☆74Updated last year
- A custom ContentRepository implementation for NiFi to persist data to MinIO Object Storage☆34Updated 2 years ago
- Kubernetes operator providing Ray|Spark|Dask|MPI clusters on-demand☆14Updated last year
- OpenFaaS function demonstrating how CloudEvents0.1 may be handled within the function itself.☆13Updated 6 years ago
- ☆36Updated last month
- Cloud Storage Connector integrates Apache Pulsar with cloud storage.☆27Updated last week
- Aiven's S3 Sink Connector for Apache Kafka®☆66Updated 4 months ago
- Set of tools for creating backups, compaction and restoration of Apache Kafka® Clusters☆20Updated this week
- Using Debezium with WarpStream as a Kafka alternative for CDC☆19Updated 10 months ago
- This is a basic Apache Pinot example for ingesting real-time MySQL change logs using Debezium☆27Updated 4 years ago
- Telecom scenarios implemented with streaming techniques☆11Updated last year
- Data Catalog is a service for indexing parameterized, strongly-typed data artifacts across revisions. It also powers Flytes memoization s…☆54Updated last year
- Presto cluster on top of kubernetes☆18Updated 5 years ago
- Demos using Conduktor Gateway☆16Updated 9 months ago
- Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Stor…☆41Updated 2 years ago
- Kubernetes API to MQTT connector service☆23Updated 2 years ago
- Proxy requests to Kafka with middleware.☆27Updated 6 years ago
- ☆22Updated 5 years ago
- Data migration Engine for YugabyteDB database☆41Updated this week
- In-Memory Analytics for Kafka using DuckDB☆90Updated this week
- A Apache Hive SerDe (short for serializer/deserializer) for the Ion file format.☆28Updated 10 months ago