Remedy small files by combining them into larger ones.
☆195Jul 1, 2022Updated 3 years ago
Alternatives and similar repositories for filecrush
Users that are interested in filecrush are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Remedy small files by combining them into larger ones.☆23Oct 31, 2018Updated 7 years ago
- Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http:…☆73Jan 1, 2023Updated 3 years ago
- Toolkit of simple scripts useful for managing Hadoop☆17Mar 31, 2011Updated 15 years ago
- functionstest☆33Oct 25, 2016Updated 9 years ago
- Mahout vector encoding for pig☆53Nov 20, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- NiFi Dynamic Script Executors☆15Jul 17, 2016Updated 9 years ago
- ☆10Aug 26, 2025Updated 9 months ago
- NEW: see http://www.hops.io/. OLD: This work aims to re-engineer the Hadoop Distributed File System (HDFS) so that it can be 1) highly av…☆26Jan 2, 2012Updated 14 years ago
- A set of examples and utilities for using Pig with Cassandra. For the latest jar release, check the Downloads link.☆84Aug 21, 2014Updated 11 years ago
- Interactive Audience Analytics with Spark and HyperLogLog☆55Oct 14, 2015Updated 10 years ago
- Hadoop library for large-scale data processing, now an Apache Incubator project☆581Jul 8, 2014Updated 11 years ago
- An Ansible collection of utilities and other resources for Cloudera Platform deployments☆13May 4, 2026Updated 3 weeks ago
- Pig on Apache Spark☆82Mar 23, 2015Updated 11 years ago
- An app built on Cloudera Enterprise for tracking metrics of jobs that run in YARN framework☆13Feb 5, 2016Updated 10 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- An application to monitor and drive the Spark JobServer☆11Dec 12, 2014Updated 11 years ago
- The world beyond batch: Streaming 102中文翻译☆17Mar 17, 2017Updated 9 years ago
- Opinionated Data Pipelines + Business Analytics Done Right :: The Last Mile on the Data Cloud.☆23Aug 1, 2022Updated 3 years ago
- File compaction tool that runs on top of the Spark framework.☆58Apr 17, 2019Updated 7 years ago
- A bunch of utility classes for Java, Hadoop, HBase, Pig, etc.☆77Mar 31, 2014Updated 12 years ago
- Flink China Doc & Blog | Markdown Support & Auto Deploy☆13Sep 4, 2020Updated 5 years ago
- Tools for spark which we use on the daily basis☆65Jul 2, 2020Updated 5 years ago
- A wrapper for Hadoop in Scala☆42Jul 18, 2010Updated 15 years ago
- Tool for gathering blocks and replicas meta data from HDFS. It also builds a heat map showing how replicas are distributed along disks an…☆55May 9, 2017Updated 9 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆17Jan 25, 2017Updated 9 years ago
- A Pelican plugin to generate PDF resumes automatically from a Pelican page in Markdown☆11Feb 8, 2016Updated 10 years ago
- A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, orga…☆2,263May 19, 2026Updated last week
- ☆48Apr 12, 2022Updated 4 years ago
- Kaltura's next generation Analytics solution based on Spark, Cassandra and Kafka☆12Mar 31, 2023Updated 3 years ago
- ☆24Feb 4, 2021Updated 5 years ago
- Spark job for compacting avro files together☆12Jan 26, 2018Updated 8 years ago
- sample oozie workflows☆17Jun 13, 2017Updated 8 years ago
- Smart Storage Management for Big Data, a comprehensive hot/cold data optimized solution☆140Jan 3, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Mirror of Apache Eagle☆411Aug 22, 2020Updated 5 years ago
- Low level integration of Spark and Kafka☆131Mar 15, 2018Updated 8 years ago
- Erlang/Elixir Release Assembler☆59Apr 21, 2014Updated 12 years ago
- Patched, refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20☆37Aug 13, 2012Updated 13 years ago
- an impala client for ruby☆34Jan 25, 2017Updated 9 years ago
- Spark Streaming HBase Example☆22May 20, 2026Updated last week
- Collection of HDP Tuning Tricks & Tips (unofficial guide)☆17Sep 26, 2017Updated 8 years ago