jwoschitz / avrocountLinks
Count records in Avro files efficiently
☆18Updated 2 years ago
Alternatives and similar repositories for avrocount
Users that are interested in avrocount are comparing it to the libraries listed below
Sorting:
- A library that provides useful extensions to Apache Spark and PySpark.☆232Updated 3 weeks ago
- A Python client for Apache Livy, enabling use of remote Apache Spark clusters.☆70Updated 4 years ago
- Airflow declarative DAGs via YAML☆133Updated 2 years ago
- easy install parquet-tools☆184Updated last year
- Apache (Py)Spark type annotations (stub files).☆118Updated 3 years ago
- Black for Databricks notebooks☆48Updated 8 months ago
- Composable filesystem hooks and operators for Apache Airflow.☆17Updated 4 years ago
- Pylint plugin for static code analysis on Airflow code☆97Updated 5 years ago
- Nested array transformation helper extensions for Apache Spark☆37Updated 2 years ago
- Command line (CLI) tool to inspect Apache Parquet files on the go☆198Updated 2 years ago
- Airflow Backfill UI based plugin for existing / new Airflow environment☆64Updated 5 years ago
- The Amazon S3 Tables catalog is a client library that bridges control plane operations provided by S3 Tables to engines like Apache Spark…☆149Updated 2 weeks ago
- pytest plugin to run the tests with support of pyspark☆88Updated 8 months ago
- A Giter8 template for scio☆31Updated last week
- Avro SerDe for Apache Spark structured APIs.☆242Updated 8 months ago
- JSON schema parser for Apache Spark☆82Updated 3 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Updated 4 years ago
- Easy CPU Profiling for Apache Spark applications☆49Updated last month
- Avro record class and reader generator☆20Updated 3 years ago
- Spark data profiling utilities☆22Updated 7 years ago
- Builds Airflow DAGs from configuration files. Powers all DAGs on the Etsy Data Platform☆259Updated 2 years ago
- Dissecting data structures☆343Updated 2 months ago
- ✨ A Pydantic to PySpark schema library☆118Updated this week
- Airflow plugin to transfer arbitrary files between operators☆78Updated 7 years ago
- Task Metrics Explorer☆14Updated 6 years ago
- Performant Redshift data source for Apache Spark☆141Updated 3 weeks ago
- Spark style guide☆271Updated last year
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆52Updated 7 months ago
- Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!☆235Updated last year
- A declarative PySpark framework for row- and aggregate-level data quality validation.☆66Updated last month