Material for the course "Introduction to Apache Spark APIs for Data Processing" https://sparktraining.web.cern.ch/
☆19May 13, 2025Updated last year
Alternatives and similar repositories for SparkTraining
Users that are interested in SparkTraining are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Create a streaming pipeline using Kafka and Kafka Connect☆14Jun 29, 2020Updated 5 years ago
- On the fly, translation of Spark programs to run natively on your Oracle DB. Your Spark programs require no changes.☆35Apr 15, 2025Updated last year
- Chess Engine made in PL/SQL☆12Aug 12, 2024Updated last year
- Serializable ACID transactions on streaming data☆25Oct 21, 2022Updated 3 years ago
- Spark SQL Macros provides a mechanism similar to Spark User-Defined function registration; with the key enhancement being that custom cod…☆16Mar 17, 2021Updated 5 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆18Jul 16, 2017Updated 8 years ago
- Perturbation Benchmark☆20Mar 15, 2018Updated 8 years ago
- General purpose framework to run CMS experiment workflows on HDFS/Spark platform☆12Jun 11, 2026Updated last week
- Oracle Database Free GitHub Action☆22Mar 2, 2025Updated last year
- Deep Learning Compression and Acceleration SDK -- deep model compression for Edge and IoT embedded systems, and deep model acceleration f…☆20Mar 17, 2018Updated 8 years ago
- Birgitta is a Python ETL test and schema framework, providing automated tests for pyspark notebooks/recipes.☆16Nov 9, 2023Updated 2 years ago
- Variable Selection Network with PyTorch☆12May 29, 2024Updated 2 years ago
- Hands-On Data Warehousing with Azure Data Factory, published by Packt☆15Jan 18, 2023Updated 3 years ago
- FITS data source for Spark SQL and DataFrames☆22Apr 12, 2023Updated 3 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- This is a repo to implement Anomaly Detection which is the technique of identifying rare events or observations which can raise suspicion…☆22Jan 25, 2023Updated 3 years ago
- ☆11Aug 14, 2014Updated 11 years ago
- Collection of dockerized ETL jobs managed by data engineering.☆22Updated this week
- Combination of Dockerized Hortonworks projects and other Hadoop ecosystem components☆10Oct 11, 2019Updated 6 years ago
- Apache Spark Data Source for ROOT File Format☆29Jul 18, 2019Updated 6 years ago
- Create a data mart using Azure Data Factory as ELT / ETL, Azure Synapse as database and Power BI as visualization tool.☆19Apr 20, 2022Updated 4 years ago
- ☆11May 16, 2022Updated 4 years ago
- Spawns JupyterHub single user servers in Marathon☆10Oct 8, 2017Updated 8 years ago
- Docker Image - Tadpole DB Hub☆14Jul 28, 2021Updated 4 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A sample solution to demonstrate microservices + data platform + mlops platform on Azure☆15Mar 20, 2023Updated 3 years ago
- Apache Spark examples exclusively in Java☆105Apr 21, 2023Updated 3 years ago
- Go tool for building, distributing and publishing Go projects☆31Jun 11, 2026Updated last week
- Adaptive File Source Connector for Spark, optimised for reading from object stores☆15Oct 18, 2022Updated 3 years ago
- Sadnbox of Spark-notebook☆10Mar 19, 2016Updated 10 years ago
- An expansive bundle of NiFi additions intended to be used for generating test data☆11Aug 6, 2023Updated 2 years ago
- Java code for Apache Nifi processors☆11Jun 5, 2017Updated 9 years ago
- LDAP to RestAPI Gateway Server☆12Dec 4, 2017Updated 8 years ago
- Go Client for Hive Metastore☆14Dec 18, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …☆32Apr 12, 2023Updated 3 years ago
- A curated list of awesome PrestoDB / Trino software, libraries, tools and resources☆18Jun 28, 2021Updated 4 years ago
- ☆13Feb 11, 2024Updated 2 years ago
- Code and links to the data for the article "Machine Learning Pipelines with Modern Big DataTools for High Energy Physics"☆31Jun 11, 2024Updated 2 years ago
- JupyterLab Notebook for Mesosphere DC/OS☆11Aug 6, 2019Updated 6 years ago
- Tracking Responses to the "Reproducibility in Computer Science" Repository (http://reproducibility.cs.arizona.edu/)☆60Oct 2, 2014Updated 11 years ago
- ☁️ Cloud native cats with prometheus metrics, kubernetes ready and cute as hell 🐈☆15Dec 7, 2018Updated 7 years ago