t-ivanov / BigDataReadingLinks
List of papers, reports and links of materials on Big Data and related topics.
☆38Updated 8 years ago
Alternatives and similar repositories for BigDataReading
Users that are interested in BigDataReading are comparing it to the libraries listed below
Sorting:
- Real-world Spark pipelines examples☆83Updated 7 years ago
- Apache Spark examples exclusively in Java☆102Updated 2 years ago
- Examples To Help You Learn Apache Spark☆77Updated 7 years ago
- An extension of Yahoo's Benchmarks☆108Updated last year
- Cache File System optimized for columnar formats and object stores☆184Updated 3 years ago
- Spark Terasort☆121Updated 2 years ago
- an anagram☆136Updated 4 years ago
- Apache Flink™ training material website☆78Updated 5 years ago
- Magic to help Spark pipelines upgrade☆34Updated last year
- Code snippets from the Streaming Systems book (streamingbook.net).☆254Updated 3 years ago
- List of some interesting projects☆32Updated 5 years ago
- A library for Spark DataFrame using MinIO Select API☆99Updated 6 years ago
- Readings in Stream Processing☆125Updated 2 months ago
- ☆311Updated 6 years ago
- Use the TPC-DS benchmark to test Spark SQL performance☆181Updated 5 years ago
- Testbench for experimenting with Apache Hive at any data scale.☆64Updated 8 years ago
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆128Updated 9 months ago
- Developing Spark External Data Sources using the V2 API☆48Updated 7 years ago
- ACID Data Source for Apache Spark based on Hive ACID☆97Updated 4 years ago
- Flowchart for debugging Spark applications☆107Updated last year
- A series of Jupyter notebooks to demonstrate the functionality of Apache Calcite☆59Updated 5 years ago
- A temporary home for LinkedIn's changes to Apache Iceberg (incubating)☆63Updated 2 weeks ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆90Updated last year
- Quark is a data virtualization engine over analytic databases.☆100Updated 8 years ago
- These are some code examples☆55Updated 5 years ago
- Resource for the book Trino: The Definitive Guide (and formerly Presto: The Definitive Guide)☆230Updated 3 years ago
- An example Apache Beam project.☆111Updated 8 years ago
- Drizzle integration with Apache Spark☆120Updated 7 years ago
- A tool to get better debug info on spark's memory usage☆42Updated 6 years ago
- Examples of Spark 3.0☆46Updated 4 years ago