RealImpactAnalytics / trumania
Trumania is a scenario-based random dataset generator library in python 3
☆111Updated 2 years ago
Alternatives and similar repositories for trumania:
Users that are interested in trumania are comparing it to the libraries listed below
- Dockerfiles for images used as part of the Orbyter toolset☆44Updated 9 months ago
- Repo demonstrating a Dagster pipeline to generate Neo4j Graph☆21Updated 3 years ago
- Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.☆103Updated 5 years ago
- Machine Flow enables visual execution and tracking of machine learning workflows. Users dynamically create dependency graphs, with each n…☆62Updated 6 years ago
- A series of workshop modules introducing Feast feature store.☆19Updated 2 years ago
- Using Luigi to create a Machine Learning Pipeline using the Rossman Sales data from Kaggle☆33Updated 8 years ago
- ☆110Updated last month
- A Scalable Data Cleaning Library for PySpark.☆26Updated 5 years ago
- Automated Data Science and Machine Learning library to optimize workflow.☆104Updated 2 years ago
- Server that simplifies connecting pandas to a realtime data feed, testing hypothesis and visualizing results in a web browser☆33Updated last year
- Record matching and entity resolution at scale in Spark☆34Updated last year
- Common data science and data engineering utilities to help us perform analytics. Our toolbox for data scientists, licensed under Apache-2…☆30Updated 6 years ago
- Notebooks for the ML Link Prediction Course☆14Updated 4 years ago
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- Big Data Demystified meetup and blog examples☆31Updated 6 months ago
- Automatically export Jupyter notebooks to various file formats (.py, .html, and more) on save.☆75Updated last year
- How to use Python to understand data and transform the data into a tidy format ready to be used for modelling and visualisation.☆37Updated 5 years ago
- Blog post on ETL pipelines with Airflow☆23Updated 4 years ago
- PySpark phonetic and string matching algorithms☆39Updated last year
- scaffold of Apache Airflow executing Docker containers☆85Updated 2 years ago
- ☆8Updated 5 years ago
- Create HTML profiling reports from Apache Spark DataFrames☆195Updated 5 years ago
- HandySpark - bringing pandas-like capabilities to Spark dataframes☆192Updated 5 years ago
- ☆11Updated 6 years ago
- Predict the poverty of households in Costa Rica using automated feature engineering.☆23Updated 4 years ago
- Utilities for creating ETL pipelines with mara☆36Updated 2 years ago
- Machine Learning in Snowflake☆24Updated 5 years ago
- locopy: Loading/Unloading to Redshift and Snowflake using Python.☆106Updated this week
- Primrose modeling framework for simple production models☆33Updated 11 months ago
- This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apach…☆19Updated 8 years ago