A write-audit-publish implementation on a data lake without the JVM
☆45Aug 12, 2024Updated last year
Alternatives and similar repositories for no-jvm-wap-with-iceberg
Users that are interested in no-jvm-wap-with-iceberg are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A playground for running duckdb as a stateless query engine over a data lake☆219Jan 10, 2024Updated 2 years ago
- ☆22Feb 5, 2024Updated 2 years ago
- Testing various methods of moving Arrow data between processes☆16Mar 29, 2023Updated 3 years ago
- Capture the logical plan from Spark (SQL)☆22Mar 6, 2021Updated 5 years ago
- Personal Finance Project to automatically collect swiss banking transaction into a DWH and visualise it☆25Mar 3, 2024Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Open-source agentic schema CLI. Optimised for claude code, gemini, codex and co-pilot. Skills included.☆44Mar 20, 2026Updated last week
- How to evaluate the Quality of your Data with Great Expectations and Spark.☆31Mar 29, 2023Updated 3 years ago
- SQL query executor on remote DuckDB instance using Apache Arrow Flight RPC through Streamlit Web interface.☆25Nov 2, 2024Updated last year
- Demo repository to lambda-fy your dbt runs☆11Sep 7, 2023Updated 2 years ago
- A DataFusion-powered Serverless S3 Proxy.☆17Apr 15, 2024Updated last year
- Serve a 1x1 GIF pixel from an AWS lambda-powered endpoint☆13Sep 7, 2017Updated 8 years ago
- A dbt package to run natural language queries☆10Jan 13, 2023Updated 3 years ago
- DuckDB API Server with Arrow Flight SQL Airport support and concurrent writes/reads (quackpipe)☆121Mar 5, 2025Updated last year
- ☆187May 21, 2025Updated 10 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 🏟☆28Nov 11, 2020Updated 5 years ago
- Malloy model examples and associated datasets☆23Feb 1, 2026Updated last month
- Unleash the performance potential of your Parquet files.☆45Feb 24, 2026Updated last month
- A library for parsing images in Mojo☆20Apr 14, 2025Updated 11 months ago
- Template-based generation of DAG cards from Metaflow classes, inspired by Google cards for machine learning models.☆30Dec 7, 2021Updated 4 years ago
- Anki Overdrive API for Python☆12Oct 21, 2017Updated 8 years ago
- ☆22Mar 31, 2022Updated 3 years ago
- End to end data engineering project☆58Oct 27, 2022Updated 3 years ago
- GAPandas4 is a Python package for querying the Google Analytics Data API for GA4 and displaying the results in a Pandas dataframe.☆34Jul 6, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Building a poor man's data lake: Exploring the Power of Polars and Delta Lake☆11Dec 6, 2025Updated 3 months ago
- ☆12Oct 25, 2023Updated 2 years ago
- Apache Spark Connect Client for Rust☆117Jun 10, 2025Updated 9 months ago
- Trainable embedding transformation for confidence estimation, feature extraction, explainability and conversion from dense to sparse.☆26Jun 9, 2025Updated 9 months ago
- A work-in-progress book on Dask☆12Jul 15, 2023Updated 2 years ago
- A high-performance data streaming system using DuckDB and Apache Arrow Flight.☆96Feb 22, 2025Updated last year
- Datalog implementation in Scala.☆12Jun 17, 2014Updated 11 years ago
- A flake8 plugin that detects of usage withColumn in a loop or inside reduce☆28Jun 20, 2025Updated 9 months ago
- Example for simple Apache Arrow Flight service with Apache Spark and TensorFlow clients☆37Mar 9, 2021Updated 5 years ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- This project implements a Lakehouse Medallion Architecture using modern Data Stack tools such as Fivetran, Snowflake and dbt. The fictici…☆14Sep 30, 2024Updated last year
- Composable expressions for data pipelines☆502Updated this week
- A demo instance of mage for pulling sample data from a public Google pub/sub topic and transforming with dbt.☆12Jan 5, 2024Updated 2 years ago
- Personal project for setting up an open source data warehouse.☆32Jul 11, 2025Updated 8 months ago
- Markify is an open source command line application written in python which scrapes data from your social media accounts and utilises mark…☆13Aug 8, 2024Updated last year
- reference implementations and use cases done with bauplan☆62Updated this week
- An implementation of Defeasible Logic in Python☆15Sep 2, 2018Updated 7 years ago