opensearch-project / opensearch-sparkLinks
Spark Accelerator framework ; It enables secondary indices to remote data stores.
☆37Updated this week
Alternatives and similar repositories for opensearch-spark
Users that are interested in opensearch-spark are comparing it to the libraries listed below
Sorting:
- Query your data using familiar SQL or intuitive Piped Processing Language (PPL)☆149Updated this week
- OpenSearch Benchmark - a community driven, open source project to run performance tests for OpenSearch☆129Updated 3 weeks ago
- ☆25Updated last year
- The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them☆135Updated last year
- ☆38Updated last week
- Spline agent for Apache Spark☆197Updated 2 weeks ago
- Search Request Processor: pipeline for transformation of queries and results inline with a search request.☆26Updated 7 months ago
- ☆225Updated 2 weeks ago
- The Amazon S3 Tables catalog is a client library that bridges control plane operations provided by S3 Tables to engines like Apache Spark…☆137Updated last month
- Low Cost, Simple and Scalable Way of Data Replication to Apache Iceberg/Cloud/Data Lake☆283Updated this week
- ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related …☆129Updated this week
- Analytics Accelerator Library for Amazon S3 is an open source library that accelerates data access from client applications to Amazon S3.☆54Updated this week
- The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog a…☆225Updated 5 months ago
- Rewrite BigQuery, Redshift, Snowflake and Databricks queries into DuckDB compatible SQL (with deep transformation of functions, data type…☆60Updated last week
- Apache flink☆22Updated last month
- Mirror of Apache Ranger☆15Updated last year
- A portable Pythonic Data Lakehouse powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to …☆239Updated last week
- ☆28Updated 3 months ago
- Apache ORC - the smallest, fastest columnar storage for Hadoop workloads☆13Updated 3 weeks ago
- Official workloads used by OpenSearch Benchmark (OSB)☆27Updated this week
- Multi-hop declarative data pipelines☆118Updated last week
- ☆80Updated 4 months ago
- Amundsen Gremlin☆21Updated 3 years ago
- Best practices and recommendations for getting started with Amazon EMR on EKS.☆66Updated 2 months ago
- ☆70Updated 8 months ago
- Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!☆37Updated last month
- Apache Wayang(incubating) is the first cross-platform data processing system.☆232Updated this week
- Enables synchronizing metadata changes (Create/Drop table/partition) from Hive Metastore to AWS Glue Data Catalog☆35Updated last year
- Apache datasketches☆99Updated 2 years ago
- Neural search transforms text into vectors and facilitates vector search both at ingestion time and at search time.☆94Updated this week