keypointt / readingLinks
collection of read materials
☆18Updated 5 years ago
Alternatives and similar repositories for reading
Users that are interested in reading are comparing it to the libraries listed below
Sorting:
- Cache File System optimized for columnar formats and object stores☆184Updated 3 years ago
- Data Pipeline Clientlib provides an interface to tail and publish to data pipeline topics.☆110Updated 3 years ago
- Few things we've met during our etl project based on spark☆24Updated 7 years ago
- DynoYARN is a framework to run simulated YARN clusters and workloads for YARN scale testing.☆62Updated 2 years ago
- A library for Spark DataFrame using MinIO Select API☆99Updated 6 years ago
- A schema store service that tracks and manages all the schemas used in the Data Pipeline☆88Updated 4 years ago
- A tool for scale and performance testing of HDFS with a specific focus on the NameNode.☆133Updated last year
- Tools for Hadoop☆25Updated 13 years ago
- ☆81Updated last year
- Real²time Exploratory Analytics on Large Datasets☆122Updated 5 years ago
- Stocator is high performing connector to object storage for Apache Spark, achieving performance by leveraging object storage semantics.☆115Updated last year
- Hops Hadoop is a distribution of Apache Hadoop with distributed metadata.☆318Updated 4 months ago
- The SpliceSQL Engine☆170Updated 2 years ago
- Functional testing framework for Big Data pipelines.☆58Updated 2 years ago
- A high-performance, reliable and extensible logging agent for uploading data to Kafka, Pulsar, etc.☆183Updated last week
- an anagram☆136Updated 4 years ago
- ☆37Updated 6 years ago
- Apache Cassandra cluster orchestration tool for the command line☆258Updated last year
- Website for DataSketches.☆104Updated 2 weeks ago
- ☆34Updated 4 years ago
- Visualize your HDFS cluster usage☆229Updated 4 years ago
- Framework for running macro benchmarks in a clustered environment☆25Updated 3 years ago
- Mirus is a cross data-center data replication tool for Apache Kafka☆205Updated 3 months ago
- A composable framework for fast and scalable data analytics☆57Updated 2 years ago
- Data Sketches for Apache Spark☆22Updated 2 years ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆52Updated 3 months ago
- StreamLine - Streaming Analytics☆165Updated 2 years ago
- UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy☆63Updated last year
- Fast and scalable timeseries database☆26Updated 5 years ago
- Lightweight proxy to expose the UI of an Apache Spark cluster that is behind a firewall☆98Updated 5 years ago