YotpoLtd/metorikku

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/YotpoLtd/metorikku)

YotpoLtd / metorikku

A simplified, lightweight ETL Framework based on Apache Spark

☆588

Alternatives and similar repositories for metorikku

Users that are interested in metorikku are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SETL-Framework / setl
View on GitHub
A simple Spark-powered ETL framework that just works 🍺
☆185Oct 2, 2025Updated 9 months ago
qubole / sparklens
View on GitHub
Qubole Sparklens tool for performance tuning Apache Spark
☆592Jun 26, 2024Updated 2 years ago
awslabs / deequ
View on GitHub
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
☆3,631Updated this week
smart-data-lake / smart-data-lake
View on GitHub
Smart Automation Tool for building modern Data Lakes and Data Pipelines
☆129Updated this week
mrpowers-io / spark-daria
View on GitHub
Essential Spark extensions and helper methods ✨😲
☆767Jun 22, 2026Updated 3 weeks ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
AbsaOSS / spline
View on GitHub
Data Lineage Tracking And Visualization Solution
☆662Updated this week
LucaCanali / sparkMeasure
View on GitHub
This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simp…
☆827May 19, 2026Updated last month
apache / kyuubi
View on GitHub
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
☆2,351Updated this week
mayur2810 / sope
View on GitHub
Apache Spark ETL Utilities
☆40Oct 23, 2024Updated last year
linkedin / transport
View on GitHub
A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…
☆306Jun 29, 2026Updated 2 weeks ago
CoxAutomotiveDataSolutions / waimak
View on GitHub
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
☆76Apr 24, 2024Updated 2 years ago
linkedin / coral
View on GitHub
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
☆906Jun 9, 2026Updated last month
AbsaOSS / hyperdrive
View on GitHub
Extensible streaming ingestion pipeline on top of Apache Spark
☆47Jul 17, 2025Updated 11 months ago
delta-io / delta
View on GitHub
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…
☆8,905Updated this week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
amundsen-io / amundsen
View on GitHub
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting…
☆4,782Jul 1, 2026Updated 2 weeks ago
uber / marmaray
View on GitHub
Generic Data Ingestion & Dispersal Library for Hadoop
☆482Mar 19, 2023Updated 3 years ago
sparsecode / DaFlow
View on GitHub
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…
☆26Jun 7, 2021Updated 5 years ago
cloudera-labs / envelope
View on GitHub
Build configuration-driven ETL pipelines on Apache Spark
☆162Oct 4, 2022Updated 3 years ago
apache / hudi
View on GitHub
Upserts, Deletes And Incremental Processing on Big Data.
☆6,186Updated this week
microsoft / hyperspace
View on GitHub
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
☆430Jan 14, 2022Updated 4 years ago
holdenk / spark-testing-base
View on GitHub
Base classes to use when writing tests with Spark
☆1,552Apr 20, 2026Updated 2 months ago
mrpowers-io / spark-fast-tests
View on GitHub
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
☆458Apr 2, 2026Updated 3 months ago
oap-project / gazelle_plugin
View on GitHub
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
☆255Feb 21, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
G-Research / spark-extension
View on GitHub
A library that provides useful extensions to Apache Spark and PySpark.
☆238Jul 1, 2026Updated last week
harbby / sylph
View on GitHub
Stream computing platform for bigdata
☆406Apr 24, 2024Updated 2 years ago
typelevel / frameless
View on GitHub
Expressive types for Spark.
☆898Updated this week
qubole / spark-acid
View on GitHub
ACID Data Source for Apache Spark based on Hive ACID
☆97Jul 7, 2021Updated 5 years ago
Teradata / kylo
View on GitHub
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies…
☆1,111Jan 12, 2023Updated 3 years ago
AbsaOSS / spline-spark-agent
View on GitHub
Spline agent for Apache Spark
☆206Updated this week
MarquezProject / marquez
View on GitHub
Collect, aggregate, and visualize a data ecosystem's metadata
☆2,240Jul 6, 2026Updated last week
hortonworks-spark / spark-atlas-connector
View on GitHub
A Spark Atlas connector to track data lineage in Apache Atlas
☆268Nov 16, 2022Updated 3 years ago
jupyter-incubator / sparkmagic
View on GitHub
Jupyter magics and kernels for working with remote Spark clusters
☆1,362Sep 9, 2025Updated 10 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
apache / livy
View on GitHub
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
☆959Updated this week
snowflakedb / spark-snowflake
View on GitHub
Snowflake Data Source for Apache Spark.
☆231Updated this week
yamrcraft / etl-light
View on GitHub
A light Kafka to HDFS/S3 ETL library based on Apache Spark
☆40Jun 29, 2017Updated 9 years ago
datamindedbe / lighthouse
View on GitHub
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…
☆64Sep 6, 2024Updated last year
dimajix / flowman
View on GitHub
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…
☆97Updated this week
mrpowers-io / quinn
View on GitHub
pyspark methods to enhance developer productivity 📣 👯 🎉
☆687Jun 9, 2026Updated last month
datahub-project / datahub
View on GitHub
The Context Platform for your Data and AI Stack
☆12,272Updated this week