Remedy small files by combining them into larger ones.
☆195Jul 1, 2022Updated 3 years ago
Alternatives and similar repositories for filecrush
Users that are interested in filecrush are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Hadoop utility to compact small files☆18Feb 16, 2026Updated 4 months ago
- SQL Windowing Functions for Hadoop☆65Jun 20, 2022Updated 3 years ago
- Remedy small files by combining them into larger ones.☆23Oct 31, 2018Updated 7 years ago
- Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http:…☆73Jan 1, 2023Updated 3 years ago
- Toolkit of simple scripts useful for managing Hadoop☆17Mar 31, 2011Updated 15 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Memory / Configuration Calculator for Hive LLAP☆14Jul 18, 2020Updated 5 years ago
- functionstest☆33Oct 25, 2016Updated 9 years ago
- Mahout vector encoding for pig☆53Nov 20, 2022Updated 3 years ago
- Example code for "Web-Scale Computer Vision using MapReduce for Multimedia Data Mining"☆48Aug 2, 2010Updated 15 years ago
- ☆10Aug 26, 2025Updated 9 months ago
- NEW: see http://www.hops.io/. OLD: This work aims to re-engineer the Hadoop Distributed File System (HDFS) so that it can be 1) highly av…☆26Jan 2, 2012Updated 14 years ago
- A set of examples and utilities for using Pig with Cassandra. For the latest jar release, check the Downloads link.☆84Aug 21, 2014Updated 11 years ago
- Interactive Audience Analytics with Spark and HyperLogLog☆55Oct 14, 2015Updated 10 years ago
- Hadoop library for large-scale data processing, now an Apache Incubator project☆581Jul 8, 2014Updated 11 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- An Ansible collection of utilities and other resources for Cloudera Platform deployments☆13Jun 11, 2026Updated last week
- Pig on Apache Spark☆82Mar 23, 2015Updated 11 years ago
- An app built on Cloudera Enterprise for tracking metrics of jobs that run in YARN framework☆13Feb 5, 2016Updated 10 years ago
- KDC for Cloudbreak provisioned Hadoop clusters☆15Aug 15, 2021Updated 4 years ago
- An application to monitor and drive the Spark JobServer☆11Dec 12, 2014Updated 11 years ago
- Cosine Similary Search in ElasticSearch + FAISS GPU☆12Mar 24, 2022Updated 4 years ago
- The world beyond batch: Streaming 102中文翻译☆16Mar 17, 2017Updated 9 years ago
- Opinionated Data Pipelines + Business Analytics Done Right :: The Last Mile on the Data Cloud.☆23Aug 1, 2022Updated 3 years ago
- Luigi Workflow Engine integration for Treasure Data☆16May 14, 2018Updated 8 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Build configuration-driven ETL pipelines on Apache Spark☆162Oct 4, 2022Updated 3 years ago
- File compaction tool that runs on top of the Spark framework.☆59Apr 17, 2019Updated 7 years ago
- A bunch of utility classes for Java, Hadoop, HBase, Pig, etc.☆77Mar 31, 2014Updated 12 years ago
- Flink China Doc & Blog | Markdown Support & Auto Deploy☆13Sep 4, 2020Updated 5 years ago
- Tools for spark which we use on the daily basis☆65Jul 2, 2020Updated 5 years ago
- A wrapper for Hadoop in Scala☆42Jul 18, 2010Updated 15 years ago
- Using Apache Spark in an ArcMap Toolbox☆27Jan 16, 2014Updated 12 years ago
- ☆16Nov 8, 2015Updated 10 years ago
- Tool for gathering blocks and replicas meta data from HDFS. It also builds a heat map showing how replicas are distributed along disks an…☆55May 9, 2017Updated 9 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A big data platform monitoring tool based on ELK stack☆20Feb 16, 2020Updated 6 years ago
- A Pelican plugin to generate PDF resumes automatically from a Pelican page in Markdown☆11Feb 8, 2016Updated 10 years ago
- Few things we've met during our etl project based on spark☆24Mar 22, 2018Updated 8 years ago
- A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, orga…☆2,267Updated this week
- ☆48Apr 12, 2022Updated 4 years ago
- Kaltura's next generation Analytics solution based on Spark, Cassandra and Kafka☆12Mar 31, 2023Updated 3 years ago
- ☆24Feb 4, 2021Updated 5 years ago