data-integrations / wranglerLinks
Wrangler Transform: A DMD system for transforming Big Data
☆105Updated this week
Alternatives and similar repositories for wrangler
Users that are interested in wrangler are comparing it to the libraries listed below
Sorting:
- Cask Hydrator Plugins Repository☆68Updated last week
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…☆26Updated 4 years ago
- A library for Spark DataFrame using MinIO Select API☆98Updated 5 years ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆51Updated last month
- DBeam exports SQL tables into Avro files using JDBC and Apache Beam☆195Updated 2 weeks ago
- Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.☆146Updated last year
- Rokku project. This project acts as a proxy on top of any S3 storage solution providing services like authentication, authorization, shor…☆69Updated 5 months ago
- Data ingestion library for Amundsen to build graph and search index☆205Updated last year
- Apache DataLab (incubating)☆152Updated last year
- Snowflake Data Source for Apache Spark.☆226Updated last month
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- DataQuality for BigData☆144Updated last year
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆124Updated last week
- Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs …☆158Updated 2 years ago
- Quark is a data virtualization engine over analytic databases.☆98Updated 8 years ago
- Build configuration-driven ETL pipelines on Apache Spark☆160Updated 2 years ago
- ☆81Updated last year
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆96Updated 2 weeks ago
- A temporary home for LinkedIn's changes to Apache Iceberg (incubating)☆61Updated 8 months ago
- An Operator for scheduling and executing NiFi Flows as Jobs on Kubernetes☆53Updated 5 years ago
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol☆34Updated 2 years ago
- Egeria's Guidance on Governance as well as large media files such as presentations and movies☆105Updated 2 years ago
- Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http:…☆70Updated 2 years ago
- An open source framework for building data analytic applications.☆780Updated this week
- ACID Data Source for Apache Spark based on Hive ACID☆97Updated 4 years ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 5 years ago
- ☆39Updated 6 years ago
- StreamLine - Streaming Analytics☆164Updated last year
- Big Data Processing Framework - Unified Data API or SQL on Any Storage☆246Updated 3 weeks ago
- Oozie Workflow to Airflow DAGs migration tool☆87Updated 5 months ago