keypointt / reading
collection of read materials
☆18Updated 4 years ago
Alternatives and similar repositories for reading:
Users that are interested in reading are comparing it to the libraries listed below
- ☆9Updated 9 years ago
- DynoYARN is a framework to run simulated YARN clusters and workloads for YARN scale testing.☆58Updated last year
- ☆37Updated 5 years ago
- A library for Spark DataFrame using MinIO Select API☆97Updated 5 years ago
- Read druid segments from hadoop☆10Updated 8 years ago
- Convert a CSV fle to ORCFile☆26Updated 5 years ago
- Few things we've met during our etl project based on spark☆24Updated 6 years ago
- Data Pipeline Clientlib provides an interface to tail and publish to data pipeline topics.☆110Updated 2 years ago
- Everything about Apache Hive that is awesome☆14Updated 4 years ago
- hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.☆28Updated 7 years ago
- Paper: A Zero-rename committer for object stores☆20Updated 3 years ago
- Database Benchmark Tool☆154Updated 11 months ago
- Lab for testing different Flink job latency optimization techniques covered in a Flink Forward 2021 talk☆27Updated 3 years ago
- Framework for running macro benchmarks in a clustered environment☆24Updated 2 years ago
- ☆34Updated 3 years ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 5 years ago
- The Internals of PySpark☆25Updated 3 weeks ago
- ☆62Updated 5 years ago
- Kubernetes (K8s) Operator for PrestoDB☆46Updated 3 years ago
- Dione - a Spark and HDFS indexing library☆50Updated 10 months ago
- Export Airflow metrics (from mysql) in prometheus format☆29Updated 2 years ago
- Data Catalog is a service for indexing parameterized, strongly-typed data artifacts across revisions. It also powers Flytes memoization s…☆54Updated last year
- spark-emr☆15Updated 10 years ago
- ☆47Updated 5 months ago
- Spark SQL index for Parquet tables☆134Updated 3 years ago
- A framework to benchmark different graph databases, based on generated data from customizable schema, distribution, and size.☆26Updated 5 years ago
- Splittable Gzip codec for Hadoop☆69Updated last month
- Go Client for Hive Metastore☆14Updated 2 years ago