Interana / eventsim
Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.
β521Updated 3 years ago
Alternatives and similar repositories for eventsim:
Users that are interested in eventsim are comparing it to the libraries listed below
- pyspark methods to enhance developer productivity π£ π― πβ670Updated last month
- β199Updated last year
- PySpark test helper methods with beautiful error messagesβ685Updated last week
- ETL best practices with airflow, with examplesβ1,330Updated 7 months ago
- BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.β393Updated last week
- This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurringβ¦β1,131Updated 7 months ago
- Python API for Deequβ765Updated 3 weeks ago
- A curated list of data engineering tools for software developersβ482Updated 7 years ago
- An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.β1,365Updated 5 years ago
- Spark Gotchas. A subjective compilation of the Apache Spark tips and tricksβ363Updated 7 years ago
- Spark style guideβ259Updated 6 months ago
- Template for a data contract used in a data mesh.β471Updated last year
- Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.β168Updated last year
- Apache Airflow integration for dbtβ402Updated 11 months ago
- Collection of dbt Tips and Tricksβ386Updated 2 years ago
- Assets related to the operation of Fishtown Analytics.β419Updated 6 months ago
- Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.β86Updated last year
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.β3,406Updated last week
- A repository of sample code to accompany our blog post on Airflow and dbt.β172Updated last year
- Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.ioβ2,067Updated this week
- An Airflow docker image preconfigured to work well with Spark and Hadoop/EMRβ174Updated last year
- Builds Airflow DAGs from configuration files. Powers all DAGs on the Etsy Data Platformβ261Updated last year
- A Data Engineering & Machine Learning Knowledge Hubβ1,126Updated last year
- Accumulated knowledge and experience in the field of Data Engineeringβ868Updated 2 years ago
- Repo to migrate old wiki to, esp for devs and code examplesβ185Updated 8 years ago
- Readings for Analytics Engineersβ246Updated 2 years ago
- A boilerplate for writing PySpark Jobsβ396Updated last year
- CLI that makes it easy to create, test and deploy Airflow DAGs to Astronomerβ382Updated this week
- This is a repo documenting the best practices in PySpark.