amundsen-io/amundsen

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/amundsen-io/amundsen)

amundsen-io / amundsen

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

☆4,780

Alternatives and similar repositories for amundsen

Users that are interested in amundsen are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

datahub-project / datahub
View on GitHub
The Context Platform for your Data and AI Stack
☆12,347Updated this week
MarquezProject / marquez
View on GitHub
Collect, aggregate, and visualize a data ecosystem's metadata
☆2,248Updated this week
apache / atlas
View on GitHub
Apache Atlas - Open Metadata Management and Governance capabilities across the Hadoop platform and beyond
☆2,126Updated this week
fivetran / great_expectations
View on GitHub
Always know what to expect from your data.
☆11,668Updated this week
Netflix / metacat
View on GitHub
☆1,688Jul 16, 2026Updated last week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
OpenLineage / OpenLineage
View on GitHub
An Open Standard for lineage metadata collection
☆2,562Updated this week
dbt-labs / dbt-core
View on GitHub
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build application…
☆13,514Updated this week
amundsen-io / amundsendatabuilder
View on GitHub
Data ingestion library for Amundsen to build graph and search index
☆205Mar 13, 2024Updated 2 years ago
awslabs / deequ
View on GitHub
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
☆3,637Updated this week
dagster-io / dagster
View on GitHub
An orchestration platform for the development, production, and observation of data assets.
☆15,898Updated this week
airbytehq / airbyte
View on GitHub
Open-source data movement for ELT pipelines and AI agents — from APIs, databases & files to warehouses, lakes, and AI applications. Both …
☆21,696Updated this week
delta-io / delta
View on GitHub
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…
☆8,926Updated this week
sodadata / soda-core
View on GitHub
Data Contracts engine for the modern data stack. https://www.soda.io
☆2,397Updated this week
odpi / egeria
View on GitHub
Egeria core
☆918Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
open-metadata / OpenMetadata
View on GitHub
The Open Context Layer for Data and AI , OpenMetadata is the open platform for building trusted data context and business semantics for …
☆14,562Updated this week
trinodb / trino
View on GitHub
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
☆13,073Updated this week
re-data / re-data
View on GitHub
re_data - fix data issues before your users & CEO would discover them 😊
☆1,566Apr 30, 2024Updated 2 years ago
apache / airflow
View on GitHub
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
☆46,243Updated this week
apache / iceberg
View on GitHub
Apache Iceberg
☆9,079Updated this week
apache / hudi
View on GitHub
Upserts, Deletes And Incremental Processing on Big Data.
☆6,194Updated this week
PrefectHQ / prefect
View on GitHub
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
☆23,479Updated this week
flyteorg / flyte
View on GitHub
Dynamic, resilient AI orchestration. Coordinate data, models, and compute as you build AI workflows.
☆7,149Updated this week
feast-dev / feast
View on GitHub
The Open Source Feature Store for AI/ML
☆7,171Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
apache / superset
View on GitHub
Apache Superset is a Data Visualization and Data Exploration Platform
☆73,976Updated this week
Netflix / metaflow
View on GitHub
Build, Manage and Deploy AI/ML Systems
☆10,196Updated this week
rsyi / whale
View on GitHub
🐳 The stupidly simple CLI workspace for your data warehouse.
☆727Feb 8, 2023Updated 3 years ago
opendatadiscovery / odd-platform
View on GitHub
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business…
☆1,420Jul 8, 2026Updated 2 weeks ago
amundsen-io / amundsenfrontendlibrary
View on GitHub
Front-end service library for Amundsen
☆278Feb 10, 2026Updated 5 months ago
elementary-data / elementary
View on GitHub
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-host…
☆2,382Updated this week
amundsen-io / amundsenmetadatalibrary
View on GitHub
Metadata service library for Amundsen
☆82Feb 20, 2026Updated 5 months ago
treeverse / lakeFS
View on GitHub
lakeFS - Data version control for your data lake | Git for data
☆5,470Updated this week
databricks / koalas
View on GitHub
Koalas: pandas API on Apache Spark
☆3,372Mar 20, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
magda-io / magda
View on GitHub
A federated, open-source data catalog for all your big data and small data
☆604Updated this week
jghoman / awesome-apache-airflow
View on GitHub
Curated list of resources about Apache Airflow
☆3,922May 7, 2026Updated 2 months ago
sqlfluff / sqlfluff
View on GitHub
A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
☆9,825Updated this week
kedro-org / kedro
View on GitHub
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…
☆10,931Updated this week
apache / pinot
View on GitHub
Apache Pinot - A realtime distributed OLAP datastore
☆6,117Updated this week
tobymao / sqlglot
View on GitHub
Python SQL Parser and Transpiler
☆9,461Updated this week
projectnessie / nessie
View on GitHub
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
☆1,483Updated this week