criteo / beriliaLinks
Create hadoop cluster in aws ec2 for development
☆11Updated 7 years ago
Alternatives and similar repositories for berilia
Users that are interested in berilia are comparing it to the libraries listed below
Sorting:
- Simple Spark example of generating table stats for use of data quality checks☆28Updated 8 years ago
- Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm☆102Updated last year
- Build configuration-driven ETL pipelines on Apache Spark☆160Updated 2 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Updated 4 years ago
- type-class based data cleansing library for Apache Spark SQL☆78Updated 6 years ago
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…☆26Updated 4 years ago
- A tutorial on Apache Spark Unit Testing☆37Updated 9 years ago
- ☆71Updated 4 years ago
- Spark package for checking data quality☆221Updated 5 years ago
- Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks☆364Updated 8 years ago
- Spark connector for SFTP☆100Updated 2 years ago
- ☆32Updated 7 years ago
- JSON schema parser for Apache Spark☆81Updated 2 years ago
- A framework for creating composable and pluggable data processing pipelines using Apache Spark, and running them on a cluster.☆47Updated 8 years ago
- ☆247Updated 5 years ago
- Cheatsheet for Spark DataFrame☆91Updated 5 years ago
- Template for Spark Projects☆102Updated last year
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆61Updated 10 months ago
- XML Serializer/Deserializer for Apache Hive☆41Updated 5 years ago
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆76Updated last year
- Read SparkSQL parquet file as RDD[Protobuf]☆93Updated 6 years ago
- Enabling Spark Optimization through Cross-stack Monitoring and Visualization☆47Updated 7 years ago
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆186Updated 2 years ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Updated 8 months ago
- File compaction tool that runs on top of the Spark framework.☆59Updated 6 years ago
- A Spark WordCountJob example as a standalone SBT project with Specs2 tests, runnable on Amazon EMR☆118Updated 9 years ago
- An example of using Avro and Parquet in Spark SQL☆60Updated 9 years ago
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)☆74Updated last year
- ☆33Updated 9 years ago
- Simplify getting Zeppelin up and running☆56Updated 9 years ago