nielsbasjes/splittablegzip

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/nielsbasjes/splittablegzip)

nielsbasjes / splittablegzip

Splittable Gzip codec for Hadoop

☆79

Alternatives and similar repositories for splittablegzip

Users that are interested in splittablegzip are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zheyuan28 / SparkTaskMetrics
View on GitHub
Task Metrics Explorer
☆14Apr 2, 2019Updated 7 years ago
databricks / databricks-sql-cli
View on GitHub
CLI for querying Databricks SQL
☆45Nov 24, 2023Updated 2 years ago
vemonet / setup-spark
View on GitHub
✨ Setup Apache Spark in GitHub Action workflows
☆23Oct 30, 2024Updated last year
crflynn / pbspark
View on GitHub
protobuf pyspark conversion
☆23Jun 7, 2023Updated 3 years ago
nielsbasjes / kafka-rpm
View on GitHub
Scripting to create a CentOS RPM for Kafka.
☆21Nov 25, 2015Updated 10 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
MarekLani / Scala-Spark-VSCode-Remote-Containers
View on GitHub
☆12Jan 18, 2021Updated 5 years ago
saikrishnapujari / Spark-Nested-Data-Parser
View on GitHub
Nested Data (JSON/AVRO/XML) Parsing and Flattening in Spark
☆16Jan 22, 2024Updated 2 years ago
geotrellis / geotrellis-spray-tutorial-deprecated
View on GitHub
OLD VERSION OF GEOTRELLIS: A sample GIS service built using GeoTrellis and Spray
☆15Sep 30, 2016Updated 9 years ago
swoop-inc / spark-alchemy
View on GitHub
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
☆191Oct 15, 2025Updated 9 months ago
adamw / spray-tutorial
View on GitHub
Spray.io tutorial
☆22Nov 5, 2014Updated 11 years ago
greeenboi / kanflow2.0
View on GitHub
Enterprise-Grade, Task Management Software. All in one place, even your data..
☆16Nov 8, 2025Updated 8 months ago
sushilshah / gsp-sqlparser
View on GitHub
gsp-sqlparser
☆13Mar 28, 2018Updated 8 years ago
zaratsian / network_topology_analysis
View on GitHub
Code to collect and analyze traceroute data within a network topology
☆29Nov 20, 2018Updated 7 years ago
lightning-viz / lightning-default-visualizations
View on GitHub
default visualizations that come packaged with the lightning viz notebook
☆12Apr 18, 2016Updated 10 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
mattoopie / fold
View on GitHub
Advanced fold methods for Kotlin
☆13Jul 2, 2026Updated 3 weeks ago
mick2004 / nested-aad-scim-connector
View on GitHub
☆11Jan 10, 2025Updated last year
CreativeMindstorms / AI-LEGO-HEAD
View on GitHub
Python code that allows for a full AI chat assistant experience with a Lego Mindstorms Robotic Head. This includes vision.
☆22Feb 11, 2026Updated 5 months ago
paulhoule / RDFeasy
View on GitHub
Software for packaging RDF data on Virtual Machines
☆21Oct 16, 2016Updated 9 years ago
bububa / xgboost-go
View on GitHub
xgboost go wrapper for c_api
☆22Apr 18, 2018Updated 8 years ago
steveloughran / zero-rename-committer
View on GitHub
Paper: A Zero-rename committer for object stores
☆20Nov 7, 2025Updated 8 months ago
cerndb / sparkMeasure
View on GitHub
This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…
☆16May 21, 2026Updated 2 months ago
andyweaves / system-tables-audit-logs
View on GitHub
SQL Queries & Alerts for Databricks System Tables access.audit Logs
☆50Jun 29, 2026Updated 3 weeks ago
DataSystemsGroupUT / SPARKSQLRDFBenchmarking
View on GitHub
A systematic Benchmarking on the performance of Spark-SQL for processing Vast RDF datasets
☆14Jun 29, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zjffdu / flink-udf
View on GitHub
☆12Mar 12, 2021Updated 5 years ago
bdelbosc / jmxstat
View on GitHub
Poll JMX attributes from the command line
☆17Sep 4, 2012Updated 13 years ago
huggingface / dataset-dedupe-estimator
View on GitHub
parquet dedupe estimator
☆27May 26, 2026Updated 2 months ago
JaneliaSciComp / nextflow-spark
View on GitHub
☆15Apr 2, 2025Updated last year
R-Wright-1 / kraken_metaphlan_comparison
View on GitHub
R markdown documents copied from other folders on Dec 15th 2021
☆14Nov 4, 2025Updated 8 months ago
amesar / mlflow-tools
View on GitHub
Tools for MLflow
☆41Jan 31, 2024Updated 2 years ago
bryanyang0528 / docker-spark-hive-ipython
View on GitHub
Spark + Jupyer + Hive
☆16Sep 22, 2015Updated 10 years ago
pm-dev / janusgraph-exploration
View on GitHub
A simple web-app that demonstrates a graph-based architecture using JanusGraph
☆13Nov 27, 2020Updated 5 years ago
liquibase / liquibase-databricks
View on GitHub
☆36Updated this week
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
luke831215 / SQUAT
View on GitHub
a Sequencing Quality Assessment Tool
☆17Jul 28, 2022Updated 3 years ago
jsellam / react-gdpr
View on GitHub
React components for Cookies settings panel, cookie banner and redux store for GDPR Compliance websites
☆11Sep 14, 2018Updated 7 years ago
SemyonSinchenko / flake8-pyspark-with-column
View on GitHub
A flake8 plugin that detects of usage withColumn in a loop or inside reduce
☆28Jun 20, 2025Updated last year
yhoogstrate / fastafs
View on GitHub
toolkit for file system virtualisation of random access compressed FASTA, FAI, DICT & TWOBIT files
☆22Updated this week
Kyligence / kylin-tpch
View on GitHub
Run TPCH Benchmark on Apache Kylin
☆22Jan 24, 2022Updated 4 years ago
twitter-archive / hdfs-du
View on GitHub
Visualize your HDFS cluster usage
☆228Oct 13, 2020Updated 5 years ago
cheburek3000 / meme_dumper
View on GitHub
Meme dumper
☆18May 10, 2026Updated 2 months ago