pracdata / duckdb-pipelineLinks
Demonstrating the capabilities of DuckDB as a transformation engine for data lakes
☆28Updated 9 months ago
Alternatives and similar repositories for duckdb-pipeline
Users that are interested in duckdb-pipeline are comparing it to the libraries listed below
Sorting:
- ☆55Updated 2 months ago
- Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.☆123Updated this week
- ☆12Updated last month
- Unity Catalog UI☆40Updated 10 months ago
- A DataOps framework for building a lakehouse.☆52Updated this week
- Sample code to collect Apache Iceberg metrics for table monitoring☆28Updated 10 months ago
- ☆142Updated last month
- SQLMesh example projects☆30Updated last week
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 11 months ago
- ☆17Updated 11 months ago
- Yet Another (Spark) ETL Framework☆21Updated last year
- New generation opensource data stack☆70Updated 3 years ago
- A portable Datamart and Business Intelligence suite built with Docker, sqlmesh + dbtcore, DuckDB and Superset☆52Updated 8 months ago
- Rewrite BigQuery, Redshift, Snowflake and Databricks queries into DuckDB compatible SQL (with deep transformation of functions, data type…☆55Updated last week
- Arcane Insight is a data analytics project designed to harness the power of SQLMesh & DuckDB to collect, transform, and analyze data from…☆34Updated 5 months ago
- Using DuckDB with AWS Lambda to process Delta Lake data☆28Updated 5 months ago
- Full stack data engineering tools and infrastructure set-up☆53Updated 4 years ago
- DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data qualit…☆59Updated 3 weeks ago
- Python package for querying iceberg data through duckdb.☆70Updated last year
- A portable Datamart and Business Intelligence suite built with Docker, Mage, dbt, DuckDB and Superset☆53Updated 8 months ago
- Contribute to dlt verified sources 🔥☆87Updated 2 weeks ago
- This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenario…☆25Updated last week
- Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observ…☆154Updated this week
- JumpSpark - A modern cookiecutter template for pyspark projects with batteries included.☆10Updated 2 years ago
- Iceberg Playground in a Box☆56Updated 2 weeks ago
- Nicely modeled data built on the Github Archive.☆67Updated last year
- DuckDB API Server with Arrow Flight SQL Airport support and concurrent writes/reads (quackpipe)☆91Updated 4 months ago
- Python wrapper for the Sling CLI tool☆53Updated this week
- Example files used in the DuckDB - Unity Catalog blog☆10Updated 7 months ago
- SQL query executor on remote DuckDB instance using Apache Arrow Flight RPC through Streamlit Web interface.☆15Updated 8 months ago