keypointt / readingLinks
collection of read materials
☆18Updated 5 years ago
Alternatives and similar repositories for reading
Users that are interested in reading are comparing it to the libraries listed below
Sorting:
- A library for Spark DataFrame using MinIO Select API☆99Updated 6 years ago
- Data Pipeline Clientlib provides an interface to tail and publish to data pipeline topics.☆110Updated 3 years ago
- Mirus is a cross data-center data replication tool for Apache Kafka☆208Updated last month
- A high-performance, reliable and extensible logging agent for uploading data to Kafka, Pulsar, etc.☆185Updated 2 weeks ago
- Paper: A Zero-rename committer for object stores☆20Updated 3 months ago
- A tool for scale and performance testing of HDFS with a specific focus on the NameNode.☆134Updated 2 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆91Updated last year
- ☆34Updated 4 years ago
- DynoYARN is a framework to run simulated YARN clusters and workloads for YARN scale testing.☆60Updated 2 years ago
- Database Benchmark Tool☆154Updated 2 years ago
- Cache File System optimized for columnar formats and object stores☆187Updated 3 years ago
- Simple Scalable Time Series Database☆130Updated 3 years ago
- Lightweight proxy to expose the UI of an Apache Spark cluster that is behind a firewall☆98Updated 5 years ago
- Export Airflow metrics (from mysql) in prometheus format☆29Updated 9 months ago
- Myria is a scalable Analytics-as-a-Service platform based on relational algebra.☆116Updated 4 years ago
- Scripts and templates for automating Cassandra benchmark environment creation on AWS.☆35Updated 7 years ago
- The SpliceSQL Engine☆171Updated 2 years ago
- Website for DataSketches.☆108Updated 2 weeks ago
- Ansible playbooks for Apache Spark on kube☆27Updated 8 years ago
- Development repository for the kafka cookbook☆91Updated this week
- Hops Hadoop is a distribution of Apache Hadoop with distributed metadata.☆320Updated 2 weeks ago
- Java event logs collector for hadoop and frameworks☆41Updated 10 months ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated last year
- Tool for gathering blocks and replicas meta data from HDFS. It also builds a heat map showing how replicas are distributed along disks an…☆55Updated 8 years ago
- A Directed Acyclic Graph task dependency scheduler designed to simplify complex distributed pipelines☆132Updated 7 years ago
- Apiary provides modules which can be combined to create a federated cloud data lake☆37Updated last year
- ☆205Updated 2 years ago
- A home for LinkedIn's changes to Apache Iceberg☆63Updated 2 weeks ago
- A nozzle to spray a kafka topic at an HTTP endpoint. This project is deprecated and not maintained.☆49Updated 6 years ago
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆47Updated 5 years ago