Remedy small files by combining them into larger ones.
☆195Jul 1, 2022Updated 3 years ago
Alternatives and similar repositories for filecrush
Users that are interested in filecrush are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Sample Python code for working with the HBase REST interface☆24Jul 25, 2013Updated 12 years ago
- Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http:…☆73Jan 1, 2023Updated 3 years ago
- Memory / Configuration Calculator for Hive LLAP☆14Jul 18, 2020Updated 5 years ago
- functionstest☆33Oct 25, 2016Updated 9 years ago
- A complete custom processor project, for your reference.☆17Sep 29, 2015Updated 10 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Mahout vector encoding for pig☆53Nov 20, 2022Updated 3 years ago
- Example code for "Web-Scale Computer Vision using MapReduce for Multimedia Data Mining"☆48Aug 2, 2010Updated 15 years ago
- NiFi Dynamic Script Executors☆15Jul 17, 2016Updated 9 years ago
- NEW: see http://www.hops.io/. OLD: This work aims to re-engineer the Hadoop Distributed File System (HDFS) so that it can be 1) highly av…☆26Jan 2, 2012Updated 14 years ago
- A set of examples and utilities for using Pig with Cassandra. For the latest jar release, check the Downloads link.☆84Aug 21, 2014Updated 11 years ago
- Interactive Audience Analytics with Spark and HyperLogLog☆55Oct 14, 2015Updated 10 years ago
- Hadoop library for large-scale data processing, now an Apache Incubator project☆581Jul 8, 2014Updated 11 years ago
- An Ansible collection of utilities and other resources for Cloudera Platform deployments☆13Updated this week
- Pig on Apache Spark☆82Mar 23, 2015Updated 11 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- An app built on Cloudera Enterprise for tracking metrics of jobs that run in YARN framework☆13Feb 5, 2016Updated 10 years ago
- An application to monitor and drive the Spark JobServer☆12Dec 12, 2014Updated 11 years ago
- The world beyond batch: Streaming 102中文翻译☆17Mar 17, 2017Updated 9 years ago
- File compaction tool that runs on top of the Spark framework.☆59Apr 17, 2019Updated 7 years ago
- A bunch of utility classes for Java, Hadoop, HBase, Pig, etc.☆77Mar 31, 2014Updated 12 years ago
- Flink China Doc & Blog | Markdown Support & Auto Deploy☆13Sep 4, 2020Updated 5 years ago
- New IndexFS core☆27Mar 28, 2016Updated 10 years ago
- Tools for spark which we use on the daily basis☆65Jul 2, 2020Updated 5 years ago
- A wrapper for Hadoop in Scala☆42Jul 18, 2010Updated 15 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Using Apache Spark in an ArcMap Toolbox☆27Jan 16, 2014Updated 12 years ago
- ☆16Nov 8, 2015Updated 10 years ago
- Tool for gathering blocks and replicas meta data from HDFS. It also builds a heat map showing how replicas are distributed along disks an…☆55May 9, 2017Updated 9 years ago
- Few things we've met during our etl project based on spark☆24Mar 22, 2018Updated 8 years ago
- A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, orga…☆2,264Apr 22, 2026Updated 2 weeks ago
- Kaltura's next generation Analytics solution based on Spark, Cassandra and Kafka☆12Mar 31, 2023Updated 3 years ago
- ☆24Feb 4, 2021Updated 5 years ago
- Spark job for compacting avro files together☆12Jan 26, 2018Updated 8 years ago
- sample oozie workflows☆17Jun 13, 2017Updated 8 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Smart Storage Management for Big Data, a comprehensive hot/cold data optimized solution☆140Jan 3, 2023Updated 3 years ago
- Mirror of Apache Eagle☆411Aug 22, 2020Updated 5 years ago
- Low level integration of Spark and Kafka☆131Mar 15, 2018Updated 8 years ago
- an impala client for ruby☆34Jan 25, 2017Updated 9 years ago
- Collection of HDP Tuning Tricks & Tips (unofficial guide)☆17Sep 26, 2017Updated 8 years ago
- ☆110Apr 17, 2017Updated 9 years ago
- Ansible playbooks for deploying Hortonworks Data Platform☆128Dec 15, 2020Updated 5 years ago