opensearch-project / opensearch-sparkLinks
Spark Accelerator framework ; It enables secondary indices to remote data stores.
☆35Updated 3 weeks ago
Alternatives and similar repositories for opensearch-spark
Users that are interested in opensearch-spark are comparing it to the libraries listed below
Sorting:
- Query your data using familiar SQL or intuitive Piped Processing Language (PPL)☆144Updated this week
- OpenSearch Benchmark - a community driven, open source project to run performance tests for OpenSearch☆121Updated last week
- Apache datasketches☆97Updated 2 years ago
- Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-dis…☆21Updated last year
- ☆25Updated last year
- ☆38Updated this week
- Analytics Accelerator Library for Amazon S3 is an open source library that accelerates data access from client applications to Amazon S3.☆46Updated this week
- Monitoring and insights on your data lakehouse tables☆31Updated this week
- The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them☆135Updated last year
- The Amazon S3 Tables catalog is a client library that bridges control plane operations provided by S3 Tables to engines like Apache Spark…☆127Updated last month
- ☆27Updated last month
- Apache Wayang(incubating) is the first cross-platform data processing system.☆224Updated this week
- ☆86Updated this week
- Spline agent for Apache Spark☆195Updated this week
- LST-Bench is a framework that allows users to run benchmarks specifically designed for evaluating Log-Structured Tables (LSTs) such as De…☆77Updated this week
- A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.☆26Updated this week
- This project provides fully automated one-click experience to create Cloud and Kubernetes environment to run Data Analytics workload like…☆55Updated 2 years ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated 6 months ago
- ☆70Updated 6 months ago
- Apache ORC - the smallest, fastest columnar storage for Hadoop workloads☆13Updated 2 months ago
- Lakehouse storage system benchmark☆75Updated 2 years ago
- A machine learning plugin in Open Distro for real time anomaly detection on streaming data.☆81Updated 2 years ago
- Multi-hop declarative data pipelines☆117Updated this week
- User tools for Spark RAPIDS☆62Updated this week
- Point-in-Time optimizations for Apache Spark☆30Updated last year
- A repo for all spark examples using Rapids Accelerator including ETL, ML/DL, etc.☆159Updated 2 weeks ago
- A temporary home for LinkedIn's changes to Apache Iceberg (incubating)☆61Updated 7 months ago
- A best practices guide for using AWS EMR. The guide will cover best practices on the topics of cost, performance, security, operational e…☆107Updated 3 weeks ago
- Low Cost, Simple and Scalable Way of Data Replication to Apache Iceberg/Cloud/Data Lake☆271Updated last week
- ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related …☆123Updated this week