Remedy small files by combining them into larger ones.
☆195Jul 1, 2022Updated 3 years ago
Alternatives and similar repositories for filecrush
Users that are interested in filecrush are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Hadoop utility to compact small files☆18Feb 16, 2026Updated last month
- SQL Windowing Functions for Hadoop☆65Jun 20, 2022Updated 3 years ago
- Remedy small files by combining them into larger ones.☆23Oct 31, 2018Updated 7 years ago
- Sample Python code for working with the HBase REST interface☆24Jul 25, 2013Updated 12 years ago
- Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http:…☆72Jan 1, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Toolkit of simple scripts useful for managing Hadoop☆17Mar 31, 2011Updated 14 years ago
- Memory / Configuration Calculator for Hive LLAP☆14Jul 18, 2020Updated 5 years ago
- functionstest☆33Oct 25, 2016Updated 9 years ago
- A complete custom processor project, for your reference.☆17Sep 29, 2015Updated 10 years ago
- Mahout vector encoding for pig☆53Nov 20, 2022Updated 3 years ago
- NiFi Dynamic Script Executors☆15Jul 17, 2016Updated 9 years ago
- ☆10Aug 26, 2025Updated 7 months ago
- NEW: see http://www.hops.io/. OLD: This work aims to re-engineer the Hadoop Distributed File System (HDFS) so that it can be 1) highly av…☆26Jan 2, 2012Updated 14 years ago
- A set of examples and utilities for using Pig with Cassandra. For the latest jar release, check the Downloads link.☆84Aug 21, 2014Updated 11 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Interactive Audience Analytics with Spark and HyperLogLog☆55Oct 14, 2015Updated 10 years ago
- Hadoop library for large-scale data processing, now an Apache Incubator project☆581Jul 8, 2014Updated 11 years ago
- Pig on Apache Spark☆82Mar 23, 2015Updated 11 years ago
- KDC for Cloudbreak provisioned Hadoop clusters☆15Aug 15, 2021Updated 4 years ago
- An application to monitor and drive the Spark JobServer☆12Dec 12, 2014Updated 11 years ago
- Opinionated Data Pipelines + Business Analytics Done Right :: The Last Mile on the Data Cloud.☆23Aug 1, 2022Updated 3 years ago
- Build configuration-driven ETL pipelines on Apache Spark☆162Oct 4, 2022Updated 3 years ago
- File compaction tool that runs on top of the Spark framework.☆59Apr 17, 2019Updated 6 years ago
- A bunch of utility classes for Java, Hadoop, HBase, Pig, etc.☆76Mar 31, 2014Updated 11 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Flink China Doc & Blog | Markdown Support & Auto Deploy☆13Sep 4, 2020Updated 5 years ago
- A wrapper for Hadoop in Scala☆42Jul 18, 2010Updated 15 years ago
- ☆16Nov 8, 2015Updated 10 years ago
- Tool for gathering blocks and replicas meta data from HDFS. It also builds a heat map showing how replicas are distributed along disks an…☆55May 9, 2017Updated 8 years ago
- ☆17Jan 25, 2017Updated 9 years ago
- A big data platform monitoring tool based on ELK stack☆20Feb 16, 2020Updated 6 years ago
- A Pelican plugin to generate PDF resumes automatically from a Pelican page in Markdown☆11Feb 8, 2016Updated 10 years ago
- Few things we've met during our etl project based on spark☆24Mar 22, 2018Updated 8 years ago
- Nginx module for etags on dynamic content☆37Feb 3, 2016Updated 10 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, orga…☆2,260Updated this week
- ☆48Apr 12, 2022Updated 3 years ago
- Kaltura's next generation Analytics solution based on Spark, Cassandra and Kafka☆12Mar 31, 2023Updated 2 years ago
- Spark job for compacting avro files together☆12Jan 26, 2018Updated 8 years ago
- sample oozie workflows☆17Jun 13, 2017Updated 8 years ago
- Smart Storage Management for Big Data, a comprehensive hot/cold data optimized solution☆141Jan 3, 2023Updated 3 years ago
- Mirror of Apache Eagle☆410Aug 22, 2020Updated 5 years ago