File compaction tool that runs on top of the Spark framework.
☆59Apr 17, 2019Updated 6 years ago
Alternatives and similar repositories for spark-compaction
Users that are interested in spark-compaction are comparing it to the libraries listed below
Sorting:
- Remedy small files by combining them into larger ones.☆23Oct 31, 2018Updated 7 years ago
- Hadoop utility to compact small files☆18Feb 16, 2026Updated last week
- Integration of Iceberg table management into Spark SQL☆11Jan 21, 2020Updated 6 years ago
- Kafka Examples repository.☆44Feb 5, 2019Updated 7 years ago
- ☆26Dec 18, 2019Updated 6 years ago
- ☆15Jul 28, 2017Updated 8 years ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Nov 4, 2024Updated last year
- Build configuration-driven ETL pipelines on Apache Spark☆161Oct 4, 2022Updated 3 years ago
- High performance HBase / Spark SQL engine☆28Jul 7, 2022Updated 3 years ago
- Atomix Jepsen tests☆14Feb 7, 2017Updated 9 years ago
- Camus Compressor merges files created by Camus and saves them in a compressed format.☆13Mar 20, 2023Updated 2 years ago
- Instructions for setting up Kerberos, Zookeeper, and Kafka with SASL☆16Jan 22, 2018Updated 8 years ago
- Data Quality Monitoring Tool☆15Dec 5, 2017Updated 8 years ago
- hello-streams :: Introducing the stream-first mindset☆16Mar 5, 2024Updated last year
- Examples for Apache Oozie book☆18May 30, 2016Updated 9 years ago
- Code examples for my blog posts☆22Nov 7, 2018Updated 7 years ago
- Maelstrom is an open source Kafka integration with Spark that is designed to be developer friendly, high performance (millisecond stream …☆22Feb 6, 2017Updated 9 years ago
- Simple example of Solr Block Joins between Parents and Children, implemented in SolrJ☆22Jul 2, 2014Updated 11 years ago
- A tool for data sampling, data generation, and data diffing☆345Jan 8, 2026Updated last month
- Parquet file generator☆22Apr 17, 2018Updated 7 years ago
- Typeclass interfaces to access user-defined Scala annotations☆24Feb 12, 2025Updated last year
- Opinionated CNCF-based, Docker Compose setup for everything needed to develop a 12factor app☆18Feb 23, 2022Updated 4 years ago
- Multi-stage, config driven, SQL based ETL framework using PySpark☆26Sep 16, 2019Updated 6 years ago
- A convenience library for Apache Kafka integration in a Dropwizard service.☆24Feb 23, 2026Updated last week
- ☆23Oct 8, 2018Updated 7 years ago
- Liquibase extension to add Impala Database support☆24Mar 8, 2022Updated 3 years ago
- Data sets and Vagrant script to provision a virtual machine for Apache Calcite development☆30Mar 24, 2023Updated 2 years ago
- Small app using spring, akka, kafka, mongo db☆23Feb 24, 2016Updated 10 years ago
- Kafka to Avro Writer based on Apache Beam. It's a generic solution that reads data from multiple kafka topics and stores it on in cloud s…☆25Apr 7, 2021Updated 4 years ago
- ☆243Jun 14, 2018Updated 7 years ago
- Demo code for implementing and showcasing a Fraud Detection Engine with Apache Flink.☆33Oct 20, 2022Updated 3 years ago
- Kite SDK Examples☆99May 8, 2021Updated 4 years ago
- Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.☆128Sep 7, 2018Updated 7 years ago
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆186Oct 15, 2025Updated 4 months ago
- Simple Spark example of generating table stats for use of data quality checks☆28Apr 28, 2017Updated 8 years ago
- Kafka-Connect SMT (Single Message Transformations) with SQL syntax (Using Apache Calcite for the SQL parsing)☆33Apr 16, 2020Updated 5 years ago
- SQLiteFlow Support☆13Oct 31, 2022Updated 3 years ago
- Fabric-based framework for deploying and managing SolrCloud clusters in the cloud.☆90Mar 19, 2019Updated 6 years ago
- ☆10Jul 1, 2022Updated 3 years ago