keypointt / readingLinks
collection of read materials
☆18Updated 5 years ago
Alternatives and similar repositories for reading
Users that are interested in reading are comparing it to the libraries listed below
Sorting:
- A high-performance, reliable and extensible logging agent for uploading data to Kafka, Pulsar, etc.☆185Updated 2 weeks ago
- A library for Spark DataFrame using MinIO Select API☆99Updated 6 years ago
- DynoYARN is a framework to run simulated YARN clusters and workloads for YARN scale testing.☆60Updated 2 years ago
- Data Pipeline Clientlib provides an interface to tail and publish to data pipeline topics.☆110Updated 3 years ago
- The SpliceSQL Engine☆171Updated 2 years ago
- Website for DataSketches.☆108Updated last week
- Hops Hadoop is a distribution of Apache Hadoop with distributed metadata.☆320Updated 2 weeks ago
- Java event logs collector for hadoop and frameworks☆41Updated 10 months ago
- Avro2TF is designed to fill the gap of making users' training data ready to be consumed by deep learning training frameworks.☆128Updated 5 years ago
- ☆81Updated 2 years ago
- A schema store service that tracks and manages all the schemas used in the Data Pipeline☆88Updated 4 years ago
- ☆37Updated 6 years ago
- Measuring the performance of popular streaming engines with Yahoo's Streaming Benchmark☆53Updated 6 years ago
- Cache File System optimized for columnar formats and object stores☆187Updated 3 years ago
- Lossy Counting and Sticky Sampling implementation for efficient frequency counts on data streams.☆63Updated 9 years ago
- Real²time Exploratory Analytics on Large Datasets☆121Updated 6 years ago
- Export Airflow metrics (from mysql) in prometheus format☆29Updated 9 months ago
- A tool for scale and performance testing of HDFS with a specific focus on the NameNode.☆134Updated 2 years ago
- Myria is a scalable Analytics-as-a-Service platform based on relational algebra.☆116Updated 4 years ago
- Big Data Processing Framework - Unified Data API or SQL on Any Storage☆251Updated 6 months ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 6 years ago
- A Directed Acyclic Graph task dependency scheduler designed to simplify complex distributed pipelines☆132Updated 7 years ago
- Mirus is a cross data-center data replication tool for Apache Kafka☆208Updated last month
- The Internals of PySpark☆27Updated last year
- Kubernetes (K8s) Operator for PrestoDB☆46Updated 4 years ago
- A Cascading Workflow Visualizer☆83Updated 2 years ago
- Docker Image and Kubernetes Configurations for Spark 2.x☆41Updated 6 years ago
- Stocator is high performing connector to object storage for Apache Spark, achieving performance by leveraging object storage semantics.☆114Updated last year
- ☆34Updated 4 years ago
- Spark ML Lib serving library☆48Updated 7 years ago