t-ivanov / BigDataReadingLinks
List of papers, reports and links of materials on Big Data and related topics.
☆38Updated 8 years ago
Alternatives and similar repositories for BigDataReading
Users that are interested in BigDataReading are comparing it to the libraries listed below
Sorting:
- Code snippets from the Streaming Systems book (streamingbook.net).☆253Updated 3 years ago
- an anagram☆136Updated 3 years ago
- Apache Spark examples exclusively in Java☆102Updated 2 years ago
- Few things we've met during our etl project based on spark☆24Updated 7 years ago
- Code samples for the book☆39Updated 11 years ago
- Real-world Spark pipelines examples☆83Updated 7 years ago
- Examples To Help You Learn Apache Spark☆77Updated 6 years ago
- Apache Flink™ training material website☆78Updated 5 years ago
- An extension of Yahoo's Benchmarks☆107Updated last year
- Magic to help Spark pipelines upgrade☆34Updated 10 months ago
- Code examples and docker environment for Spark☆27Updated 9 years ago
- Spark Terasort☆121Updated 2 years ago
- Drizzle integration with Apache Spark☆120Updated 6 years ago
- Use the TPC-DS benchmark to test Spark SQL performance☆180Updated 5 years ago
- Quark is a data virtualization engine over analytic databases.☆98Updated 8 years ago
- An example Apache Beam project.☆111Updated 8 years ago
- Stocator is high performing connector to object storage for Apache Spark, achieving performance by leveraging object storage semantics.☆114Updated last year
- A curated list of awesome Apache Spark packages and resources.☆40Updated 8 years ago
- Flowchart for debugging Spark applications☆106Updated 10 months ago
- Labs and data files for a full-day Spark workshop☆24Updated 2 months ago
- ☆311Updated 6 years ago
- Resource for the book Trino: The Definitive Guide (and formerly Presto: The Definitive Guide)☆226Updated 2 years ago
- Readings in Stream Processing☆122Updated last month
- [ARCHIVED] Moved to github.com/NVIDIA/spark-xgboost-examples☆72Updated 5 years ago
- No longer maintained and soon to be deleted☆77Updated 5 years ago
- Mirror of Apache crail (Incubating)☆150Updated 3 years ago
- List of some interesting projects☆32Updated 5 years ago
- Tools for building, packaging, and OAP public cloud integrations such as AWS EMR, Google Dataproc and K8S.☆17Updated last year
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆89Updated last year
- UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy☆62Updated last year