Hadoop utility to compact small files
☆18Feb 16, 2026Updated last week
Alternatives and similar repositories for datasqueeze
Users that are interested in datasqueeze are comparing it to the libraries listed below
Sorting:
- Insights Explorer is a tool to catalogue and present analytical & research work.☆13Nov 26, 2024Updated last year
- A service which allows Hive Metastore Listeners to be deployed outside of the Hive Metastore Service☆13Jul 23, 2025Updated 7 months ago
- ☆14Oct 17, 2022Updated 3 years ago
- Service for automatically managing and cleaning up unreferenced data☆49Aug 6, 2025Updated 6 months ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆92Mar 5, 2024Updated last year
- Demonstration of a Hive Input Format for Iceberg☆26Mar 12, 2021Updated 4 years ago
- Remedy small files by combining them into larger ones.☆194Jul 1, 2022Updated 3 years ago
- File compaction tool that runs on top of the Spark framework.☆59Apr 17, 2019Updated 6 years ago
- Advanced block device testing/file system testing, targetting SNIA compatible reporting☆12Oct 15, 2025Updated 4 months ago
- Apiary provides modules which can be combined to create a federated cloud data lake☆37Apr 3, 2024Updated last year
- Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.☆284Feb 18, 2026Updated last week
- Use Terraform outputs in your ruby code.☆11Feb 5, 2020Updated 6 years ago
- Python wrappers for the FirecREST API☆12Dec 23, 2025Updated 2 months ago
- Lustre Repository with MS patches☆13Feb 21, 2026Updated last week
- HADOOP-CLI is an interactive command line shell that makes interacting with the Hadoop Distribted Filesystem (HDFS) simpler and more intu…☆36Feb 2, 2026Updated 3 weeks ago
- A lightweight message queue for Java that requires no dedicated queue server. Just a Redis server.☆36Sep 9, 2021Updated 4 years ago
- Java event logs collector for hadoop and frameworks☆41Mar 25, 2025Updated 11 months ago
- seckill秒杀项目【PRC】☆10Apr 13, 2019Updated 6 years ago
- Integration of Iceberg table management into Spark SQL☆11Jan 21, 2020Updated 6 years ago
- 支持分库分表jdbc的flink connector☆10Dec 31, 2021Updated 4 years ago
- A timer module for Redis☆11Oct 16, 2019Updated 6 years ago
- extended benchmarking automation tool for HPC applications☆16Updated this week
- An exploration of Flink and change-data-capture via flink-cdc-connectors☆11Jul 7, 2021Updated 4 years ago
- Lustre HSM tools☆10Feb 19, 2024Updated 2 years ago
- Auto detection of apt proxies in the LAN, caching and checking status☆10Feb 13, 2025Updated last year
- Cloyster HPC is a turnkey HPC cluster solution with an user-friendly installer☆10Oct 2, 2025Updated 4 months ago
- Second generation of the ICGC DCC release ETL built on Spark☆10Apr 8, 2019Updated 6 years ago
- Examples for eclairjs-node and eclairjs-nashorn☆12Jan 24, 2017Updated 9 years ago
- ☆10Aug 13, 2021Updated 4 years ago
- Ruby client for temp-mail.ru☆11Jan 31, 2017Updated 9 years ago
- A Spark datasource for the HadoopOffice library☆36Sep 29, 2025Updated 5 months ago
- A light Kafka to HDFS/S3 ETL library based on Apache Spark☆40Jun 29, 2017Updated 8 years ago
- Library for HTTP request signing (Ruby implementation)☆12Jul 23, 2025Updated 7 months ago
- Demo for scalable Elasticsearch setups with Frozen Indices, Index Lifecycle Management, and Rollups☆12Oct 17, 2020Updated 5 years ago
- Exposes Redis stream through the command line☆12Jun 28, 2022Updated 3 years ago
- Tool to profile usage of HPC resources by regularly probing processes.☆11Updated this week
- SaltStack states commonly used in DevOps☆10Feb 2, 2026Updated 3 weeks ago
- Windows EventLog hooks for Logrus☆10Jun 4, 2025Updated 8 months ago
- Telegram bot which knows IPv6 excuses.☆11Mar 24, 2018Updated 7 years ago