t-ivanov / BigDataReadingLinks
List of papers, reports and links of materials on Big Data and related topics.
☆39Updated 8 years ago
Alternatives and similar repositories for BigDataReading
Users that are interested in BigDataReading are comparing it to the libraries listed below
Sorting:
- Apache Spark examples exclusively in Java☆103Updated 2 years ago
- Examples To Help You Learn Apache Spark☆78Updated 7 years ago
- Code snippets from the Streaming Systems book (streamingbook.net).☆254Updated 3 years ago
- Testbench for experimenting with Apache Hive at any data scale.☆64Updated 8 years ago
- Apache Flink™ training material website☆78Updated 5 years ago
- An example Apache Beam project.☆111Updated 8 years ago
- Magic to help Spark pipelines upgrade☆34Updated last year
- Code examples and docker environment for Spark☆28Updated 9 years ago
- Real-world Spark pipelines examples☆83Updated 7 years ago
- A curated list of awesome Apache Spark packages and resources.☆40Updated 8 years ago
- These are some code examples☆56Updated 6 years ago
- Few things we've met during our etl project based on spark☆24Updated 7 years ago
- List of some interesting projects☆32Updated 6 years ago
- ☆41Updated 9 years ago
- an anagram☆137Updated 4 years ago
- Developing Spark External Data Sources using the V2 API☆48Updated 7 years ago
- An extension of Yahoo's Benchmarks☆109Updated 2 years ago
- Code samples for the book☆39Updated 12 years ago
- Labs and data files for a full-day Spark workshop☆24Updated 8 months ago
- A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python,…☆210Updated 7 years ago
- An implementation of a real-world map-reduce workflow in each major framework.☆152Updated 9 years ago
- Docker Image and Kubernetes Configurations for Spark 2.x☆41Updated 6 years ago
- The SpliceSQL Engine☆171Updated 2 years ago
- A series of Jupyter notebooks to demonstrate the functionality of Apache Calcite☆59Updated 5 years ago
- Tools for building, packaging, and OAP public cloud integrations such as AWS EMR, Google Dataproc and K8S.☆18Updated last year
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆91Updated last year
- A tutorial on Apache Spark Unit Testing☆37Updated 10 years ago
- Quark is a data virtualization engine over analytic databases.☆100Updated 8 years ago
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆130Updated last year
- Drizzle integration with Apache Spark☆120Updated 7 years ago