Hadoop output committers for S3
☆114Jul 9, 2020Updated 5 years ago
Alternatives and similar repositories for s3committer
Users that are interested in s3committer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Parquet Command-line Tools☆19Oct 26, 2016Updated 9 years ago
- Spark cloud integration: tests, cloud committers and more☆20Jan 30, 2025Updated last year
- Iceberg is a table format for large, slow-moving tabular data☆494Apr 10, 2023Updated 3 years ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Oct 11, 2021Updated 4 years ago
- Provide functionality to build statistical models to repair dirty tabular data in Spark☆12Apr 21, 2023Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- DynamoDB data source for Apache Spark☆95Sep 2, 2021Updated 4 years ago
- Base classes to use when writing tests with Spark☆1,552Apr 20, 2026Updated 2 months ago
- A command-line tool for launching Apache Spark clusters.☆652Dec 13, 2024Updated last year
- Utilities for Apache Spark☆34Mar 5, 2016Updated 10 years ago
- Impatient fork of Ammonite☆63Jul 30, 2018Updated 7 years ago
- Hadoop utility jar for troubleshooting integration with cloud object stores☆38Updated this week
- Scala backend + TypeScript frontend Example for Idealingua API Language☆10May 5, 2021Updated 5 years ago
- s3mper - Consistent Listing for S3☆232Mar 24, 2023Updated 3 years ago
- Schema Registry integration for Apache Spark☆40Nov 16, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Big Data Toolkit for the JVM☆147Nov 4, 2020Updated 5 years ago
- Avro Schema Shredder is a REST API that enables storage of Avro Schemas in Apache Atlas. This API enables an organization to use Apache A…☆13Jan 11, 2017Updated 9 years ago
- SBT plugins for publishing to Maven Central, shading and managing dependencies, reporting to Coveralls from TravisCI, and more☆14Nov 13, 2020Updated 5 years ago
- Kinesis Connector for Structured Streaming☆139Jul 2, 2024Updated 2 years ago
- Paper: A Zero-rename committer for object stores☆20Nov 7, 2025Updated 7 months ago
- ☆12Mar 31, 2021Updated 5 years ago
- Speak Slack notifications and process Slack slash commands☆15Dec 20, 2018Updated 7 years ago
- Qubole Sparklens tool for performance tuning Apache Spark☆591Jun 26, 2024Updated 2 years ago
- type-class based data cleansing library for Apache Spark SQL☆79Jun 23, 2019Updated 7 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A knowledge base of Dotty internals and all things related☆19Jul 25, 2019Updated 6 years ago
- Spark Streaming Checkpoint File Manager for MinIO☆11Apr 25, 2023Updated 3 years ago
- A library on top of either pex or conda-pack to make your Python code easily available on a cluster☆47Feb 4, 2026Updated 5 months ago
- IdeaLingua RPC for Scala, TypeScript, C#☆21Jun 16, 2026Updated 2 weeks ago
- Real-time distributed messaging and document synchronization.☆15May 8, 2013Updated 13 years ago
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol☆34Sep 8, 2022Updated 3 years ago
- Code sign iOS applications, without proprietary Apple software or hardware☆17Jun 21, 2017Updated 9 years ago
- Remote shuffle service for Apache Spark to store shuffle data on remote servers.☆335Sep 29, 2023Updated 2 years ago
- Expressive types for Spark.☆898Jun 22, 2026Updated last week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Dask and Spark interactions☆21Mar 13, 2017Updated 9 years ago
- Apache Parquet reader in Scala without Apache Spark - developed at Purdue University☆12Feb 17, 2017Updated 9 years ago
- API for converting JVM objects to representations by MIME type, for the Jupyter ecosystem.☆26Jan 16, 2020Updated 6 years ago
- Dask integration for Snowflake☆32Aug 4, 2025Updated 11 months ago
- Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark☆1,368Aug 22, 2023Updated 2 years ago
- This repo was moved to https://github.com/appcelerator-developer-relations/Template.Hierarchical-Navigation☆50Apr 2, 2015Updated 11 years ago
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆191Oct 15, 2025Updated 8 months ago