huda-lab / synner
Generating Realistic Synthetic Data
☆33Updated last year
Alternatives and similar repositories for synner:
Users that are interested in synner are comparing it to the libraries listed below
- MirrorDataGenerator is a python tool that generates synthetic data based on user-specified causal relations among features in the data. I…☆21Updated 2 years ago
- Datapractices site☆33Updated last year
- Awesome Orchest projects, both official and submitted by the community.☆25Updated last year
- Provide an easy way with Python to protect your data sources by searching its metadata. 🛡️☆16Updated 2 weeks ago
- ☆36Updated this week
- Dremio Flight connector. Access Dremio using Arrow flight☆40Updated 4 years ago
- Batteries included toolkit for data engineering.☆33Updated last month
- real-time data + ML pipeline☆54Updated 2 weeks ago
- ☆11Updated 3 years ago
- Data Lineage Tracing Library☆22Updated 3 years ago
- A framework of open-source technologies to design real-time machine learning systems☆28Updated last year
- Data Catalog for Databases and Data Warehouses☆32Updated last year
- A curated list of awesome open source tools and commercial products to catalog, version, and manage data 🚀☆30Updated 2 years ago
- BigBertha is an architecture design that demonstrates how automated LLMOps (Large Language Models Operations) can be achieved on any Kube…☆27Updated last year
- Open Data Product Specification with examples☆9Updated last year
- Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms and…☆28Updated last year
- GraphRag vs Embeddings☆13Updated 7 months ago
- Sample configuration to deploy a modern data platform.☆87Updated 3 years ago
- plait.py - a fake data modeler☆433Updated 6 years ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 6 months ago
- simplify working with DataHub API endpoints☆46Updated last month
- openclean - Data Cleaning and data profiling library for Python☆72Updated 3 years ago
- Birgitta is a Python ETL test and schema framework, providing automated tests for pyspark notebooks/recipes.☆14Updated last year
- ☆14Updated 3 years ago
- 📖 A curated list of resources dedicated to synthetic data☆125Updated 2 years ago
- Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊☆78Updated 4 months ago
- Delta reader for the Ray open-source toolkit for building ML applications☆44Updated last year
- Amundsen Gremlin☆21Updated 2 years ago