ExpediaGroup/datasqueeze

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ExpediaGroup/datasqueeze)

ExpediaGroup / datasqueeze

Hadoop utility to compact small files

☆18

Alternatives and similar repositories for datasqueeze

Users that are interested in datasqueeze are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ExpediaGroup / drone-fly
View on GitHub
A service which allows Hive Metastore Listeners to be deployed outside of the Hive Metastore Service
☆13Jun 30, 2026Updated 3 weeks ago
edwardcapriolo / filecrush
View on GitHub
Remedy small files by combining them into larger ones.
☆196Jul 1, 2022Updated 4 years ago
ExpediaGroup / insights-explorer
View on GitHub
Insights Explorer is a tool to catalogue and present analytical & research work.
☆14Nov 26, 2024Updated last year
ExpediaGroup / circus-train
View on GitHub
Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.
☆93Mar 5, 2024Updated 2 years ago
ExpediaGroup / apiary-data-lake
View on GitHub
Terraform scripts for deploying Apiary Data Lake
☆19Apr 16, 2026Updated 3 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ExpediaGroup / hiveberg
View on GitHub
Demonstration of a Hive Input Format for Iceberg
☆26Mar 12, 2021Updated 5 years ago
ExpediaGroup / beekeeper
View on GitHub
Service for automatically managing and cleaning up unreferenced data
☆50Apr 24, 2026Updated 2 months ago
HiveRunner / mutant-swarm
View on GitHub
Mutation testing framework and code coverage for Hive SQL
☆24May 11, 2021Updated 5 years ago
ExpediaGroup / apiary
View on GitHub
Apiary provides modules which can be combined to create a federated cloud data lake
☆38Apr 3, 2024Updated 2 years ago
ExpediaGroup / waggle-dance
View on GitHub
Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.
☆288Jun 25, 2026Updated 3 weeks ago
KeithSSmith / spark-compaction
View on GitHub
File compaction tool that runs on top of the Spark framework.
☆59Apr 17, 2019Updated 7 years ago
flipkart-incubator / pulsar-weighted-consumer
View on GitHub
Pulsar consumer clients offering priority consumption
☆12Mar 17, 2023Updated 3 years ago
724686158 / China-ICD-10
View on GitHub
3位代码类目表；6位扩展代码表；疾病分类与代码(修订版)；章节名称及代码
☆11Aug 20, 2018Updated 7 years ago
Observe-secretly / AutoRefreshImpala
View on GitHub
自动刷新Impala元数据。给Impala3.2以下没有自动刷新元数据功能的孩子们使用
☆11Jul 27, 2021Updated 4 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
liyifeng / flumeUserGuideCnDoc
View on GitHub
Flume用户手册中文翻译版
☆12Dec 4, 2023Updated 2 years ago
dstreev / cloudera_upgrade_utils
View on GitHub
Various tools to help plan HDP and CDH upgrades to CDP
☆13Dec 7, 2021Updated 4 years ago
igr / jrsmq
View on GitHub
A lightweight message queue for Java that requires no dedicated queue server. Just a Redis server.
☆37Sep 9, 2021Updated 4 years ago
verils / gotemplate4j
View on GitHub
A Go template engine implementation for Java that evaluates Go templates and generates textual output.
☆18May 25, 2026Updated last month
Middlecon / DBImport
View on GitHub
DBImport ingestion tool. Handle import, export and standard ETL flows in Hadoop/Hive
☆19Feb 17, 2026Updated 5 months ago
digoal / pg_tpch
View on GitHub
TPC-H like benchmark for PostgreSQL
☆16Feb 14, 2016Updated 10 years ago
openmessaging / openmessaging.github.io
View on GitHub
OpenMessaging homepage
☆13Mar 24, 2024Updated 2 years ago
epiphanous / flinkrunner
View on GitHub
A library to support building a coherent set of flink jobs
☆17Oct 5, 2024Updated last year
sundeck-io / qtag
View on GitHub
QTag: Turbocharge Your SQL Comments
☆12Jan 30, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ExpediaGroup / stream-registry
View on GitHub
Stream Discovery and Stream Orchestration
☆124Jan 7, 2026Updated 6 months ago
ExpediaGroup / corc
View on GitHub
An ORC File Scheme for the Cascading data processing platform.
☆14Aug 26, 2021Updated 4 years ago
yurisasuke / go-taskq
View on GitHub
A simple golang job queue
☆13Jan 19, 2023Updated 3 years ago
amazon-ion / ion-hive-serde
View on GitHub
A Apache Hive SerDe (short for serializer/deserializer) for the Ion file format.
☆31Mar 27, 2025Updated last year
daiwei233 / hbase_exporter
View on GitHub
HBase Exporter,fetch data from jmx for region-level data.
☆17Jun 27, 2026Updated 3 weeks ago
ExpediaGroup / jasvorno
View on GitHub
A library for strong, schema based conversion between 'natural' JSON documents and Avro
☆18Mar 5, 2024Updated 2 years ago
minheq / monorepo-cra-source-map
View on GitHub
Monorepo/Yarn Workspaces/Lerna with Create React App in TypeScript that produces sourcemap for VSCode debugging and Sentry reports
☆11Apr 7, 2023Updated 3 years ago
triggan / neptune-workshop-ui
View on GitHub
☆15Mar 31, 2026Updated 3 months ago
ypt / experiment-flink-cdc-connectors
View on GitHub
An exploration of Flink and change-data-capture via flink-cdc-connectors
☆11Jul 7, 2021Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
dounine / spark-sql-datasource
View on GitHub
jdbc2 datasource suport DUPLICATE KEY incrment
☆19Nov 25, 2020Updated 5 years ago
rayokota / keta
View on GitHub
A Transactional Metadata Store Backed by Apache Kafka
☆25Sep 22, 2025Updated 9 months ago
yuxingfirst / tcputil
View on GitHub
一个Go语言的TCP/IP工具库
☆19Jul 7, 2013Updated 13 years ago
balamaci / muninn
View on GitHub
Java Alerting Framework for ElasticSearch
☆12May 20, 2016Updated 10 years ago
paypal / NNAnalytics
View on GitHub
NameNodeAnalytics is a self-help utility for scouting and maintaining the namespace of an HDFS instance.
☆121Nov 25, 2025Updated 7 months ago
Karasiq / proxyutils
View on GitHub
Scala HTTP/SOCKS proxy library, based on akka-streams
☆10Nov 3, 2018Updated 7 years ago
asdaraujo / filecrush
View on GitHub
Remedy small files by combining them into larger ones.
☆23Oct 31, 2018Updated 7 years ago