Example Repo to have full end to end pyspark testing via docker-compose
☆31Feb 6, 2023Updated 3 years ago
Alternatives and similar repositories for pyspark-testing-env
Users that are interested in pyspark-testing-env are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of Boundary Attributions for Normal (Vector) Explanations☆11Aug 13, 2021Updated 4 years ago
- Data Engineer Roadmaps as Projects Funnel☆12Aug 10, 2022Updated 3 years ago
- In this article, you will learn how to set up a real-time data processing and analytics environment using Docker, MySQL, Redpanda, MinIO,…☆11Jun 27, 2023Updated 2 years ago
- A production-ready PySpark project template with medallion architecture, Python packaging, unit tests, integration tests, CI/CD automatio…☆65May 7, 2026Updated last week
- Simple library to export your bookmarks to the popular bookmarks platform LinkDing☆20Jan 22, 2026Updated 3 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Data science, machine learning tools on the cloud☆15Jan 13, 2021Updated 5 years ago
- Data pipeline project using Data Factory, Databricks and Cosmosdb Graph, deployed using Azure DevOps, secured using firewalls and Azure A…☆11Dec 14, 2022Updated 3 years ago
- TensorFlow implementation of the "Prompt-to-Prompt Image Editing with Cross Attention Control" for Stable Diffusion☆15Mar 25, 2023Updated 3 years ago
- Deploy a scikit model using heroku and Flask☆15May 1, 2023Updated 3 years ago
- Samples for fabric user data functions☆27Updated this week
- A pyproject.toml conversion tool for Poetry to uv migration☆20Dec 28, 2024Updated last year
- A cloud data platform product to accelerate time to insights. Our open-source framework is designed for the real world. Stripping away th…☆25May 8, 2026Updated last week
- A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation an…☆23Nov 21, 2023Updated 2 years ago
- A simple script designed to run and use i2p and i2pd on tails os along with the tor network!☆22May 19, 2025Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- quadipy is a python package to help transform structured data into RDF graph format☆19Apr 14, 2023Updated 3 years ago
- HIVE: Evaluating the Human Interpretability of Visual Explanations (ECCV 2022)☆22Jan 19, 2023Updated 3 years ago
- Code for my "Efficient Data Processing in SQL" book.☆62Aug 6, 2024Updated last year
- ☆17May 26, 2025Updated 11 months ago
- A PoC script for adding dummy GitHub contributions to past dates☆12Nov 27, 2024Updated last year
- Repository of notebooks and related collateral used in the Databricks Demo Hub, showing how to use Databricks, Delta Lake, MLflow, and mo…☆26May 27, 2021Updated 4 years ago
- A fast data generator that produces CSV files from generated relational data☆44Aug 15, 2025Updated 9 months ago
- A Terraform module to create and manage Identity and Access Management (IAM) Users on Amazon Web Services (AWS). https://aws.amazon.com/i…☆20Apr 6, 2022Updated 4 years ago
- Nomad launcher/executor for Dagster☆22Oct 2, 2025Updated 7 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…☆288Mar 4, 2026Updated 2 months ago
- Making Time Speak! 🎙️☆29Apr 13, 2026Updated last month
- For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR☆67Jan 2, 2022Updated 4 years ago
- ☆22Apr 10, 2017Updated 9 years ago
- The missing workspace tool for clojure tools.deps projects☆34Mar 22, 2026Updated last month
- Utility functions to support analytics over FHIR in BigQuery or Apache Spark☆15Jan 8, 2024Updated 2 years ago
- For Udemy students: the official repository of Rock the JVM's Spark Streaming course☆26Jan 5, 2023Updated 3 years ago
- Demo converting streamlit uber nyc rides to use duckdb☆30Apr 9, 2023Updated 3 years ago
- ☆20Nov 17, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database☆14Oct 26, 2021Updated 4 years ago
- Security Manager for the Astronomer Airflow distribution☆11Jun 25, 2024Updated last year
- ☆13Apr 8, 2023Updated 3 years ago
- ☆12Feb 23, 2024Updated 2 years ago
- Singer.io transformation component between Taps and Targets - PipelineWise compatible☆20Sep 20, 2024Updated last year
- AI terminal for easy IoT and robotics development☆45Jun 12, 2025Updated 11 months ago
- Code for the anonymous submission "Cockpit: A Practical Debugging Tool for Training Deep Neural Networks"☆31Nov 24, 2020Updated 5 years ago