linkedin / data-integration-library
The Data Integration Library project provides a library of generic components based on a multi-stage architecture for data ingress and egress.
☆30Updated 5 months ago
Alternatives and similar repositories for data-integration-library:
Users that are interested in data-integration-library are comparing it to the libraries listed below
- Mirror of Apache NiFi Flow Design System☆44Updated last year
- Data abstraction, storage, discovery, and serving system☆31Updated 3 months ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆49Updated last year
- Dione - a Spark and HDFS indexing library☆50Updated 9 months ago
- LinkedIn's version of Apache Calcite☆22Updated 2 months ago
- Data Profiler for AWS Glue Data Catalog application as described in the AWS Big Data Blog post "Build an automatic data profiling and rep…☆19Updated 4 years ago
- Wrangler Transform: A DMD system for transforming Big Data☆90Updated this week
- Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http:…☆71Updated 2 years ago
- An Operator for scheduling and executing NiFi Flows as Jobs on Kubernetes☆53Updated 4 years ago
- Cask Hydrator Plugins Repository☆67Updated this week
- CDAP UI☆19Updated this week
- Drools processor for Apache NiFi☆38Updated 5 years ago
- Demonstration of a Hive Input Format for Iceberg☆26Updated 3 years ago
- a curated list of awesome lakehouse frameworks, applications, etc☆21Updated 3 weeks ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆93Updated this week
- A library for strong, schema based conversion between 'natural' JSON documents and Avro☆18Updated 10 months ago
- ☆39Updated 5 years ago
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…☆26Updated 3 years ago
- Apiary provides modules which can be combined to create a federated cloud data lake☆36Updated 9 months ago
- An Example Dremio ARP driven connector that supports SQLLite☆19Updated 9 months ago
- Amundsen Gremlin☆20Updated 2 years ago
- Apache Flagon is a suite of comprehensive, thin-client behavioral logging tools☆25Updated 2 months ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆86Updated 10 months ago
- ☆13Updated last week
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- ☆12Updated last month
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- Set of tools for creating backups, compaction and restoration of Apache Kafka® Clusters☆19Updated this week