datumo / dataset-logger
Apache Spark Scala utility to track data records during application execution
☆11Updated last year
Related projects ⓘ
Alternatives and complementary repositories for dataset-logger
- Avro Schema to Avro IDL converter☆15Updated 3 years ago
- Code snippets used in demos recorded for the blog.☆29Updated 3 weeks ago
- Fast Apache Avro serialization/deserialization library☆43Updated 4 years ago
- A library that provides in-memory instances of both Kafka and Confluent Schema Registry to run your tests against.☆111Updated last week
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆86Updated 8 months ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆30Updated last week
- Avro Schema Evolution made easy☆34Updated 9 months ago
- Magic to help Spark pipelines upgrade☆33Updated last month
- type-class based data cleansing library for Apache Spark SQL☆79Updated 5 years ago
- Examples for High Performance Spark☆15Updated last week
- Grok Expression Transform for Kafka Connect.☆16Updated 3 years ago
- Schema Registry integration for Apache Spark☆39Updated last year
- Kafka Connector for Iceberg tables☆16Updated last year
- BigQuery integration to Apache Flink's Table API☆22Updated last week
- JSON schema parser for Apache Spark☆81Updated 2 years ago
- Spark Structured Streaming State Tools☆34Updated 4 years ago
- Flowchart for debugging Spark applications☆101Updated last month
- Big Data Newsletter☆23Updated 7 months ago
- File compaction tool that runs on top of the Spark framework.☆59Updated 5 years ago
- Open Source Secret Provider plugin for the Kafka Connect framework☆45Updated 3 months ago
- A library you can include in your Spark job to validate the counters and perform operations on success. Goal is scala/java/python support…☆107Updated 6 years ago
- A logstash codec plugin for decoding and encoding Avro records☆26Updated 8 months ago
- Task Metrics Explorer☆13Updated 5 years ago
- Kafka Streams demo project containing Derivative Events, the Processor Api and Wall-clock examples☆26Updated 4 years ago
- ☆26Updated 4 years ago
- A user friendly API for checking for and reporting on Avro schema incompatibilities.☆59Updated 8 months ago
- Analyze Slack Channel history by counting threads with specified keywords☆11Updated 2 years ago
- Nested array transformation helper extensions for Apache Spark☆36Updated last year