VorTECHsa / refinery
Refinery is a tool to extract and transform semi-structured data from Excel spreadsheets of different layouts in a declarative way.
☆50Updated 2 months ago
Alternatives and similar repositories for refinery:
Users that are interested in refinery are comparing it to the libraries listed below
- sgr (command line client for Splitgraph) and the splitgraph Python library☆322Updated 11 months ago
- A curated list to help you manage temporal data across many modalities 🚀.☆110Updated 2 years ago
- In-Memory Analytics for Kafka using DuckDB☆108Updated this week
- Data Tools Subjective List☆83Updated last year
- Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.☆111Updated 5 years ago
- Transporter for integrating OpenLineage with OpenMetadata☆12Updated last year
- Rewrite BigQuery, Redshift, Snowflake and Databricks queries into DuckDB compatible SQL (with deep transformation of functions, data type…☆47Updated last week
- DB API 2 interface for Flight SQL with SQLAlchemy extras.☆37Updated 6 months ago
- JSON Schema to Avro Mapper☆29Updated last year
- The Data Product Descriptor Specification (DPDS) Repository☆77Updated 2 months ago
- GraphQL service for arrow tables and parquet data sets.☆88Updated 2 months ago
- Arc is an opinionated framework for defining data pipelines which are predictable, repeatable and manageable.☆169Updated last year
- Aiven's S3 Sink Connector for Apache Kafka®☆69Updated 6 months ago
- Deephaven CSV☆58Updated 3 months ago
- Python binding for DataFusion☆59Updated 2 years ago
- ☆22Updated 3 weeks ago
- Delta reader for the Ray open-source toolkit for building ML applications☆45Updated last year
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆94Updated 3 weeks ago
- Open-source metadata collector based on ODD Specification☆43Updated last year
- Amundsen Gremlin☆21Updated 2 years ago
- ☆17Updated 10 months ago
- An implementation of the DatasourceV2 interface of Apache Spark™ for writing Spark Datasets to Apache Druid™.☆41Updated 5 months ago
- ThirdEye is an integrated tool for realtime monitoring of time series and interactive root-cause analysis. It enables anyone inside an or…☆92Updated 2 years ago
- ODD Specification is a universal open standard for collecting metadata.☆135Updated 5 months ago
- ☆19Updated 9 months ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated last year
- A playground for running duckdb as a stateless query engine over a data lake☆192Updated last year
- A Kafka Serde that reads and writes records from and to Blob storage (S3, Azure, Google) transparently.☆59Updated this week
- Data pipelines from re-usable components☆108Updated 2 years ago
- A pandas I/O wrapper.☆29Updated this week