cloudera-labs/envelope

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cloudera-labs/envelope)

cloudera-labs / envelope

Build configuration-driven ETL pipelines on Apache Spark

☆162

Alternatives and similar repositories for envelope

Users that are interested in envelope are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AbsaOSS / spark-hofs
View on GitHub
Scala API for Apache Spark SQL high-order functions
☆15Aug 4, 2023Updated 2 years ago
cloudera-labs / cloudera.exe
View on GitHub
An Ansible collection of utilities and other resources for Cloudera Platform deployments
☆13Jul 15, 2026Updated 2 weeks ago
noleme / noleme-flow
View on GitHub
A library enabling DAG structuring of data processing programs such as ETLs
☆18Jul 19, 2026Updated last week
KeithSSmith / spark-compaction
View on GitHub
File compaction tool that runs on top of the Spark framework.
☆59Apr 17, 2019Updated 7 years ago
YotpoLtd / metorikku
View on GitHub
A simplified, lightweight ETL Framework based on Apache Spark
☆588Jan 24, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
konrads / spark-etl
View on GitHub
Set of ETL utils for Spark
☆15May 4, 2020Updated 6 years ago
bernhard-42 / Spark-ETL-Atlas
View on GitHub
A small project to show how to add lineage to Atlas when using Spark as ETL tool
☆12Nov 29, 2016Updated 9 years ago
BenFradet / spark-kafka-writer
View on GitHub
Write your Spark data to Kafka seamlessly
☆172Jul 10, 2024Updated 2 years ago
dstreev / hive_llap_calculator
View on GitHub
Memory / Configuration Calculator for Hive LLAP
☆14Jul 18, 2020Updated 6 years ago
piotr-kalanski / data-model-generator
View on GitHub
Data model generator based on Scala case classes
☆29Nov 5, 2020Updated 5 years ago
phrocker / nifi-datasynthesizer
View on GitHub
Apache NiFi Data Synthesizer
☆15Aug 3, 2023Updated 2 years ago
Comcast / ActorServiceRegistry
View on GitHub
☆14Aug 4, 2016Updated 9 years ago
kevdoran / fdlc-demo
View on GitHub
A repository used in a NiFi Registry demo
☆13Mar 11, 2020Updated 6 years ago
maropu / spark-sql-server
View on GitHub
Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol
☆34Sep 8, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hortonworks / streamline
View on GitHub
StreamLine - Streaming Analytics
☆167Aug 27, 2023Updated 2 years ago
datasphere-oss / datasphere-service
View on GitHub
an open source dataworks platform
☆20Jun 4, 2021Updated 5 years ago
japila-books / spark-structured-streaming-internals
View on GitHub
The Internals of Spark Structured Streaming
☆420Mar 3, 2026Updated 4 months ago
mayur2810 / sope
View on GitHub
Apache Spark ETL Utilities
☆40Oct 23, 2024Updated last year
ottogroup / schedoscope
View on GitHub
Schedoscope is a scheduling framework for painfree agile development, testing, (re)loading, and monitoring of your datahub, lake, or what…
☆98Nov 14, 2019Updated 6 years ago
vngrs / spark-etl
View on GitHub
Apache Spark based ETL Engine
☆71Oct 18, 2016Updated 9 years ago
asdaraujo / edge2ai-workshop
View on GitHub
Edge2AI Workshop
☆71Jun 11, 2025Updated last year
yahoo / sherlock
View on GitHub
Sherlock is an anomaly detection service built on top of Druid
☆158Dec 2, 2024Updated last year
Chaffelson / whoville
View on GitHub
An opinionated auto-deployer for the Hortonworks Platform
☆34Feb 11, 2021Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
amient / affinity
View on GitHub
Library and a Framework for building fast, scalable, fault-tolerant Data APIs based on Akka, Avro, ZooKeeper and Kafka
☆25Oct 16, 2020Updated 5 years ago
mravi / hbase-connect-kafka
View on GitHub
Capture changes of HBase to Kafka
☆30May 3, 2016Updated 10 years ago
cloudera-labs / cloudera.cluster
View on GitHub
An Ansible collection for Cloudera Platform for on-premise and cloud Datahubs
☆38Aug 26, 2025Updated 11 months ago
jrkinley-zz / flume-interceptor-analytics
View on GitHub
Real-time analytics in Apache Flume
☆51Feb 2, 2016Updated 10 years ago
cloudera-ps / prereq-checks
View on GitHub
Prerequisites checker for Cloudera Manager and CDP PVC Base installations
☆57Oct 31, 2023Updated 2 years ago
Stratio / sparta
View on GitHub
Real Time Analytics and Data Pipelines based on Spark Streaming
☆530Oct 24, 2019Updated 6 years ago
HiveRunner / HiveRunner
View on GitHub
An Open Source unit test framework for Hive queries based on JUnit 4 and 5
☆262Jan 6, 2025Updated last year
seznam / euphoria
View on GitHub
Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model w…
☆81Nov 15, 2022Updated 3 years ago
jerryshao / spark-kafka-0-8-sql
View on GitHub
Spark Structured Streaming Kafka 0.8 Source Implementation
☆35Apr 27, 2017Updated 9 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
cloudera / llama
View on GitHub
Llama - Low Latency Application MAster
☆35Jun 27, 2022Updated 4 years ago
holdenk / high-performance-spark-examples
View on GitHub
Examples for High Performance Spark
☆16Oct 25, 2025Updated 9 months ago
brightcove-archive / ooyala_scamr
View on GitHub
A Hadoop map reduce framework for Scala.
☆15Apr 21, 2016Updated 10 years ago
jdye64 / docker-hwx
View on GitHub
Combination of Dockerized Hortonworks projects and other Hadoop ecosystem components
☆10Oct 11, 2019Updated 6 years ago
cloudera-labs / cloudera.cloud
View on GitHub
An Ansible collection for Cloudera Platform for cloud and Data Services
☆22Jul 8, 2026Updated 3 weeks ago
dxer / dataLink
View on GitHub
简单易用的ETL工具
☆17Mar 28, 2019Updated 7 years ago
non / clouseau
View on GitHub
Discover java object sizes through questionable sleuthing plus luck.
☆71Jul 16, 2018Updated 8 years ago