t-ivanov / BigDataReading
List of papers, reports and links of materials on Big Data and related topics.
☆38Updated 7 years ago
Alternatives and similar repositories for BigDataReading:
Users that are interested in BigDataReading are comparing it to the libraries listed below
- List of some interesting projects☆32Updated 5 years ago
- Source code for 'PySpark Recipes' by Raju Kumar Mishra☆25Updated 5 years ago
- Code and setup information for Introduction to Machine Learning with Spark☆12Updated 9 years ago
- Real-world Spark pipelines examples☆83Updated 7 years ago
- A curated list of awesome Apache Spark packages and resources.☆40Updated 8 years ago
- Slowly Changing Dimension type 2 using Hive query language using exclusive join technique with ORC Hive tables, partitioned and clustered…☆16Updated 5 years ago
- Running TPC-H on Apache Hive☆41Updated 5 years ago
- Code snippets from the Streaming Systems book (streamingbook.net).☆247Updated 2 years ago
- A description of the processes and techniques required to migrate a relational schema to a Cassandra database using Spark and SparkSQL☆11Updated 7 years ago
- A composable framework for fast and scalable data analytics☆57Updated 2 years ago
- Materials for Apache Arrow workshop at VLDB 2019☆42Updated 4 years ago
- Apache Spark examples exclusively in Java☆101Updated last year
- Data Sketches for Apache Spark☆22Updated 2 years ago
- Mirror of Apache MADlib site☆89Updated last year
- These are some code examples☆55Updated 5 years ago
- A tutorial on how to get started with Presto.☆56Updated 3 years ago
- ☆79Updated 2 years ago
- Readings in Stream Processing☆122Updated 4 months ago
- SQL Benchmark derived from TPC-H☆11Updated last year
- Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline☆75Updated 2 years ago
- Apache Flink™ training material website☆78Updated 4 years ago
- Labs and data files for a full-day Spark workshop☆24Updated last year
- Apache Sqoop Cookbook☆36Updated 11 years ago
- A scalable, distributed Time Series Database.☆28Updated 10 years ago
- Flowchart for debugging Spark applications☆105Updated 6 months ago
- This is the example code repository for Getting Started with Impala by John Russell (O'Reilly Media)☆22Updated 7 years ago
- [ARCHIVED] Moved to github.com/NVIDIA/spark-xgboost-examples☆70Updated 4 years ago
- Spark Terasort☆122Updated last year
- Vectorized executor to speed up PostgreSQL☆332Updated 10 years ago
- real time log event processing using spark, kafka & cassandra☆13Updated 10 years ago