data-dev / DataTracer
Data Lineage Tracing Library
☆21Updated 2 years ago
Related projects: ⓘ
- Dremio Flight connector. Access Dremio using Arrow flight☆40Updated 3 years ago
- ThirdEye is an integrated tool for realtime monitoring of time series and interactive root-cause analysis. It enables anyone inside an or…☆92Updated last year
- An open source python library for automated prediction engineering☆45Updated this week
- Beneath is a serverless real-time data platform ⚡️☆81Updated 2 years ago
- ☆30Updated 3 years ago
- dagster scikit-learn pipeline example.☆43Updated last year
- Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observ…☆102Updated this week
- Record matching and entity resolution at scale in Spark☆31Updated 10 months ago
- real-time data + ML pipeline☆54Updated this week
- Sample configuration to deploy a modern data platform.☆84Updated 2 years ago
- Delta reader for the Ray open-source toolkit for building ML applications☆40Updated 7 months ago
- ☀️🦶 A lightweight framework for collaborative, open-source feature engineering☆32Updated 2 years ago
- Python ELT Studio, an application for building ELT (and ETL) data flows.☆57Updated 2 years ago
- ThirdEye is an integrated tool for realtime monitoring of time series and interactive root-cause analysis.☆91Updated this week
- MLOps simplified. One platform, all the functionality you need. Swiss made☆94Updated last week
- ElasticSearch implementation of MlFlow tracking store☆16Updated 3 years ago
- ☆60Updated last month
- A library of Reversible Data Transforms☆117Updated this week
- Python library to run ML/data pipelines on stateless compute infrastructure (that may be ephemeral or serverless). Please see the documen…☆17Updated last year
- Python driver for Timeplus Enterprise or Timeplus Proton☆11Updated last month
- Data Catalog for Databases and Data Warehouses☆31Updated 8 months ago
- Python PMML scoring library for PySpark as SparkML Transformer☆21Updated 2 weeks ago
- A collection of python utility functions☆12Updated 2 months ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆111Updated 5 months ago
- Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.☆122Updated 3 years ago
- Lossless in-memory compression of pandas DataFrames and Series powered by the visions type system. Up to 10x less RAM needed for the same…☆28Updated last year
- ☆22Updated 2 years ago
- A Python-to-SQL transpiler as replacement for Python Pandas☆47Updated last year
- ☆28Updated 9 months ago
- Instant search for and access to many datasets in Pyspark.☆34Updated last year