Hadoop output committers for S3
☆113Jul 9, 2020Updated 5 years ago
Alternatives and similar repositories for s3committer
Users that are interested in s3committer are comparing it to the libraries listed below
Sorting:
- Iceberg is a table format for large, slow-moving tabular data☆490Apr 10, 2023Updated 2 years ago
- Paper: A Zero-rename committer for object stores☆20Nov 7, 2025Updated 3 months ago
- Provide functionality to build statistical models to repair dirty tabular data in Spark☆12Apr 21, 2023Updated 2 years ago
- Spark cloud integration: tests, cloud committers and more☆20Jan 30, 2025Updated last year
- A command-line tool for launching Apache Spark clusters.☆651Dec 13, 2024Updated last year
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Oct 11, 2021Updated 4 years ago
- Impatient fork of Ammonite☆62Jul 30, 2018Updated 7 years ago
- DynamoDB data source for Apache Spark☆95Sep 2, 2021Updated 4 years ago
- Base classes to use when writing tests with Spark☆1,549Dec 22, 2025Updated 2 months ago
- type-class based data cleansing library for Apache Spark SQL☆78Jun 23, 2019Updated 6 years ago
- Big Data Toolkit for the JVM☆146Nov 4, 2020Updated 5 years ago
- Qubole Sparklens tool for performance tuning Apache Spark☆590Jun 26, 2024Updated last year
- Scala port of the word2vec toolkit.☆11Aug 15, 2016Updated 9 years ago
- Configure an LDAPS Endpoint for Simple AD☆14Aug 29, 2017Updated 8 years ago
- Spark Streaming Checkpoint File Manager for MinIO☆11Apr 25, 2023Updated 2 years ago
- Kinesis Connector for Structured Streaming☆138Jul 2, 2024Updated last year
- ☆12Mar 31, 2021Updated 4 years ago
- EmbeDB is a small Python wrapper around LMDB built as key-value storage for embeddings.☆14Nov 4, 2022Updated 3 years ago
- Expressive types for Spark.☆895Updated this week
- A distributed in-memory fabric based on shared-memory blocks and datashape. Any language can operate on the data.☆13Feb 12, 2016Updated 10 years ago
- A basic example of how to read and write streaming data using Apache Spark and Kafka on HDInsight☆13Mar 2, 2023Updated 3 years ago
- Visualize statistics from the MOOC "Functional Programming Principles in Scala" using Scala!☆202Mar 31, 2014Updated 11 years ago
- The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog a…☆227Mar 19, 2025Updated 11 months ago
- Dask integration for Snowflake☆30Aug 4, 2025Updated 7 months ago
- Write property based tests easily on spark dataframes☆20Jan 19, 2024Updated 2 years ago
- Speak Slack notifications and process Slack slash commands☆15Dec 20, 2018Updated 7 years ago
- Real-time distributed messaging and document synchronization.☆15May 8, 2013Updated 12 years ago
- An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.☆431Jan 14, 2022Updated 4 years ago
- Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.☆128Sep 7, 2018Updated 7 years ago
- Mirror of Apache Bahir☆335Jul 7, 2023Updated 2 years ago
- Scripts for parsing / making sense of yarn logs☆52Aug 22, 2016Updated 9 years ago
- Performance optimization for Spark running on Kubernetes☆87Aug 18, 2020Updated 5 years ago
- A Time Series Library for Apache Spark☆1,022Jul 3, 2020Updated 5 years ago
- Visualize column-level data lineage in Spark SQL☆92May 13, 2022Updated 3 years ago
- Apache Parquet reader in Scala without Apache Spark - developed at Purdue University☆12Feb 17, 2017Updated 9 years ago
- ## Auto-archived due to inactivity. ## Simple JVM Profiler Using StatsD and Other Metrics Backends☆15Oct 3, 2023Updated 2 years ago
- Utilities for Apache Spark☆34Mar 5, 2016Updated 10 years ago
- Essential Spark extensions and helper methods ✨😲☆766Sep 14, 2025Updated 5 months ago
- A testing framework for Presto☆62May 2, 2025Updated 10 months ago