File compaction tool that runs on top of the Spark framework.
☆59Apr 17, 2019Updated 7 years ago
Alternatives and similar repositories for spark-compaction
Users that are interested in spark-compaction are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Remedy small files by combining them into larger ones.☆23Oct 31, 2018Updated 7 years ago
- Hadoop utility to compact small files☆18Feb 16, 2026Updated 4 months ago
- Data Quality Monitoring Tool☆15Dec 5, 2017Updated 8 years ago
- Build configuration-driven ETL pipelines on Apache Spark☆162Oct 4, 2022Updated 3 years ago
- Atomix Jepsen tests☆14Feb 7, 2017Updated 9 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Kafka Examples repository.☆44Feb 5, 2019Updated 7 years ago
- High performance HBase / Spark SQL engine☆28Jul 7, 2022Updated 3 years ago
- ☆26Dec 18, 2019Updated 6 years ago
- Web forms generator from Avro schemas☆13Jan 23, 2017Updated 9 years ago
- Liquibase extension to add Impala Database support☆24Mar 8, 2022Updated 4 years ago
- Scripts to demonstrate VPC Service Controls between tenant and shared projects☆12Jun 11, 2019Updated 7 years ago
- Hadoop InputFormat for http://druid.io/☆10Oct 26, 2016Updated 9 years ago
- The Schema Repo is a RESTful web service for storing and serving mappings between schema identifiers and schema definitions.☆155Jul 7, 2022Updated 3 years ago
- Instructions for setting up Kerberos, Zookeeper, and Kafka with SASL☆16Jan 22, 2018Updated 8 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆30May 13, 2026Updated last month
- Maelstrom is an open source Kafka integration with Spark that is designed to be developer friendly, high performance (millisecond stream …☆21Feb 6, 2017Updated 9 years ago
- ☆12Apr 7, 2025Updated last year
- Simple example of Solr Block Joins between Parents and Children, implemented in SolrJ☆22Jul 2, 2014Updated 11 years ago
- A tool for data sampling, data generation, and data diffing☆349Mar 31, 2026Updated 3 months ago
- [student project] UI to run SQL on Delta Lake tables and visualize the variations of the result among tables versions☆12Apr 21, 2020Updated 6 years ago
- Serializes RDF from a SPARQL endpoint to JSON-LD documents☆10Sep 11, 2018Updated 7 years ago
- Probabilistic programming in Scala☆34Jan 28, 2014Updated 12 years ago
- Quick module to deploy a Linux VM to Azure with Ansible installed at bootup - by @JesseLoudon☆11Apr 4, 2026Updated 2 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Multi-stage, config driven, SQL based ETL framework using PySpark☆26Sep 16, 2019Updated 6 years ago
- A table-type dbt materialization for Snowflake to enable Time Travel☆22Jan 12, 2026Updated 5 months ago
- Simple Spark example of generating table stats for use of data quality checks☆27Apr 28, 2017Updated 9 years ago
- Deploying a simple, customized Flask API in python via Google App Engine☆13Aug 20, 2017Updated 8 years ago
- Camus Compressor merges files created by Camus and saves them in a compressed format.☆13Mar 20, 2023Updated 3 years ago
- Code examples for my blog posts☆22Nov 7, 2018Updated 7 years ago
- ☆11Aug 14, 2014Updated 11 years ago
- A lightweight mapping framework that maps data objects to a number of nodes, subject to constraints☆96Mar 16, 2017Updated 9 years ago
- Delta Lake Examples☆11Apr 24, 2020Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆10Aug 2, 2021Updated 4 years ago
- Kafka to Avro Writer based on Apache Beam. It's a generic solution that reads data from multiple kafka topics and stores it on in cloud s…☆25Apr 7, 2021Updated 5 years ago
- Java implementation of the SCRAM SASL for both server and client plus examples☆17Apr 18, 2021Updated 5 years ago
- A ZooKeeper client library in Scala.☆21Apr 17, 2013Updated 13 years ago
- This library enables to use ZooKeeper as cluster coordinator in a ConstructR based cluster☆12Dec 2, 2017Updated 8 years ago
- hello-streams :: Introducing the stream-first mindset☆16Mar 5, 2024Updated 2 years ago
- ☆242Jun 14, 2018Updated 8 years ago