edwardcapriolo/filecrush

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/edwardcapriolo/filecrush)

edwardcapriolo / filecrush

Remedy small files by combining them into larger ones.

☆195

Alternatives and similar repositories for filecrush

Users that are interested in filecrush are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Impetus / jumbune
View on GitHub
Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http:…
☆73Jan 1, 2023Updated 3 years ago
cloudera / haatkit
View on GitHub
Toolkit of simple scripts useful for managing Hadoop
☆17Mar 31, 2011Updated 15 years ago
dstreev / hive_llap_calculator
View on GitHub
Memory / Configuration Calculator for Hive LLAP
☆14Jul 18, 2020Updated 5 years ago
aperepel / nifi-workshop
View on GitHub
A complete custom processor project, for your reference.
☆17Sep 29, 2015Updated 10 years ago
tdunning / pig-vector
View on GitHub
Mahout vector encoding for pig
☆53Nov 20, 2022Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
xmlking / nifi-scripting
View on GitHub
NiFi Dynamic Script Executors
☆15Jul 17, 2016Updated 9 years ago
davidgin / draw_polygons_on_private_map
View on GitHub
☆10Aug 26, 2025Updated 10 months ago
dvryaboy / pig
View on GitHub
Mirror of Apache Pig
☆18Jul 9, 2013Updated 13 years ago
lalithsuresh / Scaling-HDFS-NameNode
View on GitHub
NEW: see http://www.hops.io/. OLD: This work aims to re-engineer the Hadoop Distributed File System (HDFS) so that it can be 1) highly av…
☆26Jan 2, 2012Updated 14 years ago
jeromatron / pygmalion
View on GitHub
A set of examples and utilities for using Pig with Cassandra. For the latest jar release, check the Downloads link.
☆84Aug 21, 2014Updated 11 years ago
collectivemedia / spark-hyperloglog
View on GitHub
Interactive Audience Analytics with Spark and HyperLogLog
☆55Oct 14, 2015Updated 10 years ago
LinkedInAttic / datafu
View on GitHub
Hadoop library for large-scale data processing, now an Apache Incubator project
☆581Jul 8, 2014Updated 12 years ago
yeleid / eagleeye
View on GitHub
An app built on Cloudera Enterprise for tracking metrics of jobs that run in YARN framework
☆13Feb 5, 2016Updated 10 years ago
sequenceiq / docker-kerberos
View on GitHub
KDC for Cloudbreak provisioned Hadoop clusters
☆15Aug 15, 2021Updated 4 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
spark-jobserver / spark-jobserver-frontend
View on GitHub
An application to monitor and drive the Spark JobServer
☆11Dec 12, 2014Updated 11 years ago
JamesMcGuigan / elasticsearch-faiss-cosine-similarity-search
View on GitHub
Cosine Similary Search in ElasticSearch + FAISS GPU
☆12Mar 24, 2022Updated 4 years ago
jdong32 / Streaming102
View on GitHub
The world beyond batch: Streaming 102中文翻译
☆16Mar 17, 2017Updated 9 years ago
cloudera-labs / envelope
View on GitHub
Build configuration-driven ETL pipelines on Apache Spark
☆162Oct 4, 2022Updated 3 years ago
KeithSSmith / spark-compaction
View on GitHub
File compaction tool that runs on top of the Spark framework.
☆59Apr 17, 2019Updated 7 years ago
flink-china / doc
View on GitHub
Flink China Doc & Blog | Markdown Support & Auto Deploy
☆13Sep 4, 2020Updated 5 years ago
mozilla-metrics / akela
View on GitHub
A bunch of utility classes for Java, Hadoop, HBase, Pig, etc.
☆77Mar 31, 2014Updated 12 years ago
zhengqmark / indexfs-0.4
View on GitHub
New IndexFS core
☆27Mar 28, 2016Updated 10 years ago
RetailRocket / SparkMultiTool
View on GitHub
Tools for spark which we use on the daily basis
☆65Jul 2, 2020Updated 6 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
bsdfish / ScalaHadoop
View on GitHub
A wrapper for Hadoop in Scala
☆42Jul 18, 2010Updated 15 years ago
onefoursix / kill-long-running-impala-queries
View on GitHub
☆16Nov 8, 2015Updated 10 years ago
cerndb / hdfs-metadata
View on GitHub
Tool for gathering blocks and replicas meta data from HDFS. It also builds a heat map showing how replicas are distributed along disks an…
☆55May 9, 2017Updated 9 years ago
IgorBerman / spark-gotchas
View on GitHub
Few things we've met during our etl project based on spark
☆24Mar 22, 2018Updated 8 years ago
xanpeng / libcrush
View on GitHub
Crush algorithm from Ceph (http://ceph.com/)
☆10Nov 10, 2014Updated 11 years ago
cmenguy / pelican-resume
View on GitHub
A Pelican plugin to generate PDF resumes automatically from a Pelican page in Markdown
☆11Feb 8, 2016Updated 10 years ago
apache / gobblin
View on GitHub
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, orga…
☆2,270Jun 24, 2026Updated 2 weeks ago
dbist / oozie-examples
View on GitHub
sample oozie workflows
☆17Jun 13, 2017Updated 9 years ago
imduffy15 / spark-avro-compactor
View on GitHub
Spark job for compacting avro files together
☆12Jan 26, 2018Updated 8 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
apache / eagle
View on GitHub
Mirror of Apache Eagle
☆411Aug 22, 2020Updated 5 years ago
kaltura / kanalony
View on GitHub
Kaltura's next generation Analytics solution based on Spark, Cassandra and Kafka
☆12Mar 31, 2023Updated 3 years ago
tresata / spark-kafka
View on GitHub
Low level integration of Spark and Kafka
☆129Mar 15, 2018Updated 8 years ago
yrashk / relex
View on GitHub
Erlang/Elixir Release Assembler
☆60Apr 21, 2014Updated 12 years ago
toddlipcon / hadoop-lzo
View on GitHub
Patched, refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20
☆37Aug 13, 2012Updated 13 years ago
amplab / velox-modelserver
View on GitHub
☆110Apr 17, 2017Updated 9 years ago
zaratsian / HDP_Tuning_Unofficial
View on GitHub
Collection of HDP Tuning Tricks & Tips (unofficial guide)
☆17Sep 26, 2017Updated 8 years ago