opensearch-project / opensearch-sparkLinks
Spark Accelerator framework ; It enables secondary indices to remote data stores.
☆37Updated this week
Alternatives and similar repositories for opensearch-spark
Users that are interested in opensearch-spark are comparing it to the libraries listed below
Sorting:
- OpenSearch Benchmark - a community driven, open source project to run performance tests for OpenSearch☆124Updated this week
- Query your data using familiar SQL or intuitive Piped Processing Language (PPL)☆144Updated this week
- ☆38Updated last week
- ☆25Updated last year
- Identify atypical data and receive automatic notifications☆75Updated this week
- A portable Pythonic Data Lakehouse powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to …☆233Updated last week
- The Amazon S3 Tables catalog is a client library that bridges control plane operations provided by S3 Tables to engines like Apache Spark…☆130Updated last week
- Point-in-Time optimizations for Apache Spark☆30Updated last year
- Spline agent for Apache Spark☆196Updated 2 weeks ago
- Apache datasketches☆97Updated 2 years ago
- The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them☆135Updated last year
- Neural search transforms text into vectors and facilitates vector search both at ingestion time and at search time.☆90Updated this week
- A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.☆27Updated this week
- ☆86Updated this week
- A best practices guide for using AWS EMR. The guide will cover best practices on the topics of cost, performance, security, operational e…☆107Updated last month
- The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog a…☆222Updated 4 months ago
- 🆕 Find the k-nearest neighbors (k-NN) for your vector data☆189Updated this week
- Low Cost, Simple and Scalable Way of Data Replication to Apache Iceberg/Cloud/Data Lake☆275Updated last week
- LST-Bench is a framework that allows users to run benchmarks specifically designed for evaluating Log-Structured Tables (LSTs) such as De…☆78Updated last week
- Lakehouse storage system benchmark☆76Updated 2 years ago
- Java bindings for https://github.com/facebookincubator/velox☆32Updated this week
- Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-dis…☆21Updated last year
- Search Request Processor: pipeline for transformation of queries and results inline with a search request.☆26Updated 6 months ago
- ☆215Updated this week
- Cache File System optimized for columnar formats and object stores☆183Updated 2 years ago
- ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related …☆123Updated this week
- Open Control Plane for Tables in Data Lakehouse☆362Updated last week
- A BYOC option for Snowflake workloads☆85Updated this week
- Apache flink☆69Updated 3 weeks ago
- All the things about TPC-DS in Apache Spark☆106Updated 2 years ago