Python data repo, jupyter notebook, python scripts and data.
☆551Dec 10, 2024Updated last year
Alternatives and similar repositories for pythondataanalysis
Users that are interested in pythondataanalysis are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16May 29, 2023Updated 2 years ago
- build dw with dbt☆54Oct 24, 2024Updated last year
- This project demonstrates how to build and automate an ETL pipeline written in Python and schedule it using open source Apache Airflow or…☆20Aug 21, 2025Updated 7 months ago
- Distributed Data Systems with Azure Databricks, published by Packt☆12Jan 18, 2023Updated 3 years ago
- trino + hive + minio with postgres in docker compose☆27Aug 18, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆38Jan 27, 2026Updated 2 months ago
- ☆16Mar 9, 2026Updated last month
- A Python package that creates fine-grained dbt tasks on Apache Airflow☆20Apr 25, 2024Updated last year
- code snippet for analytics sessions☆34May 17, 2022Updated 3 years ago
- ☆16Mar 12, 2025Updated last year
- ☆32Oct 4, 2024Updated last year
- ☆22Feb 5, 2024Updated 2 years ago
- A variation on a standard Decision Tree such as that in sklearn, where nodes may be based on an aggregation of multiple splits.☆10May 24, 2024Updated last year
- Airflow Tutorials☆25Feb 28, 2021Updated 5 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra☆146Jul 27, 2023Updated 2 years ago
- Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Jo…☆40,011Apr 8, 2026Updated last week
- Acquiring and processing information on world's largest banks☆18Jun 17, 2025Updated 10 months ago
- Docker with Airflow + Postgres + Spark cluster + JDK (spark-submit support) + Jupyter Notebooks☆24Apr 2, 2022Updated 4 years ago
- Data-aware orchestration with dagster, dbt, and airbyte☆31Jan 20, 2023Updated 3 years ago
- ETL pipeline using pyspark (Spark - Python)☆118Apr 4, 2020Updated 6 years ago
- I saw this [Blog Post](https://www.morling.dev/blog/one-billion-row-challenge/) on a Billion Row challenge for Java so naturally I tried …☆14Jan 10, 2024Updated 2 years ago
- Essential PySpark for Scalable Data Analytics, published by Packt☆46Mar 2, 2026Updated last month
- ☆47Feb 23, 2021Updated 5 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Using Plotly to create a heatmap visualization of monthly and hourly data☆13Aug 9, 2021Updated 4 years ago
- An LLM-powered self-studying app using retrieval-augmented generation prompting | Streamlit LLM Hackathon 2023☆17Oct 6, 2023Updated 2 years ago
- ☆67Sep 24, 2025Updated 6 months ago
- ☆135Mar 16, 2026Updated last month
- ☆19Nov 27, 2023Updated 2 years ago
- ☆17Dec 9, 2022Updated 3 years ago
- PyRapidML is an open source Python library which not only helps in automating Machine Learning Workflows but also helps in building end t…☆14Aug 7, 2021Updated 4 years ago
- Sample project to demonstrate data engineering best practices☆214Feb 24, 2024Updated 2 years ago
- New Generation Opensource Data Stack Demo☆456Feb 6, 2023Updated 3 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- dbt + Trino demo project, using TPC-H sample data☆19Mar 27, 2024Updated 2 years ago
- ☆46Jul 6, 2024Updated last year
- ☆339Aug 13, 2024Updated last year
- Simple project using pyflink, kafka and postgre containerized using Docker☆11Aug 26, 2024Updated last year
- A template repository to create a data project with IAC, CI/CD, Data migrations, & testing☆291Jul 11, 2024Updated last year
- A Postgres data warehouse for processing synthetic data using IAC principles☆19Feb 27, 2023Updated 3 years ago
- This reference architecture demonstrates the use of AWS Step Functions to orchestrate an Extract Transfer Load (ETL) workflow with AWS La…☆24Jun 16, 2020Updated 5 years ago