rdblue/s3committer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/rdblue/s3committer)

rdblue / s3committer

Hadoop output committers for S3

☆114

Alternatives and similar repositories for s3committer

Users that are interested in s3committer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hortonworks-spark / cloud-integration
View on GitHub
Spark cloud integration: tests, cloud committers and more
☆20Jan 30, 2025Updated last year
Netflix / iceberg
View on GitHub
Iceberg is a table format for large, slow-moving tabular data
☆494Apr 10, 2023Updated 3 years ago
ExpediaGroup / shunting-yard
View on GitHub
Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.
☆20Oct 11, 2021Updated 4 years ago
maropu / spark-data-repair-plugin
View on GitHub
Provide functionality to build statistical models to repair dirty tabular data in Spark
☆12Apr 21, 2023Updated 3 years ago
traviscrawford / spark-dynamodb
View on GitHub
DynamoDB data source for Apache Spark
☆95Sep 2, 2021Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
holdenk / spark-testing-base
View on GitHub
Base classes to use when writing tests with Spark
☆1,553Apr 20, 2026Updated 3 months ago
nchammas / flintrock
View on GitHub
A command-line tool for launching Apache Spark clusters.
☆651Dec 13, 2024Updated last year
nerdammer / spark-additions
View on GitHub
Utilities for Apache Spark
☆34Mar 5, 2016Updated 10 years ago
databricks / sbt-spark-package
View on GitHub
Sbt plugin for Spark packages
☆152Jan 10, 2018Updated 8 years ago
jupyter-scala / ammonium
View on GitHub
Impatient fork of Ammonite
☆63Jul 30, 2018Updated 7 years ago
7mind / idealingua-example
View on GitHub
Scala backend + TypeScript frontend Example for Idealingua API Language
☆10May 5, 2021Updated 5 years ago
vvaks0 / AvroSchemaShredder
View on GitHub
Avro Schema Shredder is a REST API that enables storage of Avro Schemas in Apache Atlas. This API enables an organization to use Apache A…
☆13Jan 11, 2017Updated 9 years ago
hortonworks-spark / spark-schema-registry
View on GitHub
Schema Registry integration for Apache Spark
☆40Nov 16, 2022Updated 3 years ago
hammerlab / sbt-parent
View on GitHub
SBT plugins for publishing to Maven Central, shading and managing dependencies, reporting to Coveralls from TravisCI, and more
☆14Nov 13, 2020Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
full360 / glue-sneaql-demo
View on GitHub
☆12Mar 31, 2021Updated 5 years ago
awslabs / cloudformation-ldaps-haproxy-template
View on GitHub
Configure an LDAPS Endpoint for Simple AD
☆14Aug 29, 2017Updated 8 years ago
steveloughran / cloudstore
View on GitHub
Hadoop utility jar for troubleshooting integration with cloud object stores
☆38Jun 29, 2026Updated 3 weeks ago
MrPowers / spark-slack
View on GitHub
Speak Slack notifications and process Slack slash commands
☆15Dec 20, 2018Updated 7 years ago
qubole / kinesis-sql
View on GitHub
Kinesis Connector for Structured Streaming
☆139Jul 2, 2024Updated 2 years ago
wrmsr / wava
View on GitHub
Java backend/toolchain for WebAssembly
☆16Nov 16, 2022Updated 3 years ago
lampepfl / dotty-knowledge
View on GitHub
A knowledge base of Dotty internals and all things related
☆19Jul 25, 2019Updated 6 years ago
funkyminds / cleanframes
View on GitHub
type-class based data cleansing library for Apache Spark SQL
☆79Jun 23, 2019Updated 7 years ago
7mind / idealingua-v1
View on GitHub
IdeaLingua RPC for Scala, TypeScript, C#
☆22Jun 16, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
minio / spark-streaming-checkpoint
View on GitHub
Spark Streaming Checkpoint File Manager for MinIO
☆11Apr 25, 2023Updated 3 years ago
ryu2 / isign
View on GitHub
Code sign iOS applications, without proprietary Apple software or hardware
☆17Jun 21, 2017Updated 9 years ago
qubole / sparklens
View on GitHub
Qubole Sparklens tool for performance tuning Apache Spark
☆592Jun 26, 2024Updated 2 years ago
maropu / spark-sql-server
View on GitHub
Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol
☆34Sep 8, 2022Updated 3 years ago
dask-contrib / dask-snowflake
View on GitHub
Dask integration for Snowflake
☆32Aug 4, 2025Updated 11 months ago
tylertreat / Zinc
View on GitHub
Real-time distributed messaging and document synchronization.
☆15May 8, 2013Updated 13 years ago
uber / RemoteShuffleService
View on GitHub
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
☆335Sep 29, 2023Updated 2 years ago
AndreSchumacher / avro-parquet-spark-example
View on GitHub
An example of using Avro and Parquet in Spark SQL
☆60Nov 16, 2015Updated 10 years ago
mrocklin / dask-spark
View on GitHub
Dask and Spark interactions
☆21Mar 13, 2017Updated 9 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
jmd1011 / parquet-readers
View on GitHub
Apache Parquet reader in Scala without Apache Spark - developed at Purdue University
☆12Feb 17, 2017Updated 9 years ago
typelevel / frameless
View on GitHub
Expressive types for Spark.
☆898Updated this week
Netflix / s3mper
View on GitHub
s3mper - Consistent Listing for S3
☆232Mar 24, 2023Updated 3 years ago
hammerlab / yarn-logs-helpers
View on GitHub
Scripts for parsing / making sense of yarn logs
☆51Aug 22, 2016Updated 9 years ago
dmill-bz / gremlin-bin
View on GitHub
Placeholder for gremlin-bin website issues
☆19Jul 5, 2020Updated 6 years ago
swoop-inc / spark-alchemy
View on GitHub
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
☆191Oct 15, 2025Updated 9 months ago
steveloughran / zero-rename-committer
View on GitHub
Paper: A Zero-rename committer for object stores
☆20Nov 7, 2025Updated 8 months ago