jwoschitz / avrocountLinks
Count records in Avro files efficiently
☆17Updated 2 years ago
Alternatives and similar repositories for avrocount
Users that are interested in avrocount are comparing it to the libraries listed below
Sorting:
- Scala and Spark library focused on reading OpenStreetMap Pbf files.☆86Updated 2 weeks ago
- A Giter8 template for scio☆31Updated 2 weeks ago
- A converter for the OSM PBFs to Parquet files☆94Updated 2 years ago
- lakeview is a visibility tool for S3 based data lakes☆29Updated last month
- A command line client for consuming Postgres logical decoding events in the pgoutput format☆17Updated last month
- Java/Scala library for easily authoring Flyte tasks and workflows☆44Updated 3 weeks ago
- Hadoop output committers for S3☆111Updated 5 years ago
- Demo using Apache Lucene has a reverse geocoder, running as a CLI app via Graal, AWS Lambda or Google Cloud Run☆12Updated 4 years ago
- Tools for working with parquet, impala, and hive☆134Updated 4 years ago
- Better bridge apache spark and postgresql☆23Updated last year
- Scala SDK for working with Snowplow enriched events in Spark, AWS Lambda, Flink et al.☆21Updated 10 months ago
- Airflow declarative DAGs via YAML☆133Updated last year
- DBeam exports SQL tables into Avro files using JDBC and Apache Beam☆195Updated this week
- Spark DataFrames for earth observation data☆19Updated 7 years ago
- Command line (CLI) tool to inspect Apache Parquet files on the go☆194Updated last year
- A client for the Confluent Schema Registry API implemented in Python☆53Updated 2 years ago
- JSON schema parser for Apache Spark☆81Updated 2 years ago
- The Amazon S3 Tables catalog is a client library that bridges control plane operations provided by S3 Tables to engines like Apache Spark…☆137Updated 3 weeks ago
- Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.☆112Updated 5 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Updated 4 years ago
- Avro record class and reader generator☆20Updated 3 years ago
- Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!☆233Updated 7 months ago
- Native Spark OSM PBF data source☆18Updated last year
- Parquet Command-line Tools☆19Updated 8 years ago
- Pylint plugin for static code analysis on Airflow code☆95Updated 4 years ago
- A library on top of either pex or conda-pack to make your Python code easily available on a cluster☆45Updated last month
- Algebird's HyperLogLog support for Apache Spark.☆10Updated 8 years ago
- Apache (Py)Spark type annotations (stub files).☆117Updated 3 years ago
- a spark custom window function example, to generate session IDs☆18Updated 7 years ago
- A tool for data sampling, data generation, and data diffing☆344Updated 4 months ago