An open-source project dedicated to constructing robust data pipelines and scalable software infrastructure. We leverage industry-standard tools favored by developers to enhance efficiency and reliability. Uniquely, these pipelines are field-tested on farms across Sumatra, Indonesia, ensuring real-world applicability and resilience.
☆32May 27, 2024Updated last year
Alternatives and similar repositories for orangutan-stem
Users that are interested in orangutan-stem are comparing it to the libraries listed below
Sorting:
- Scripts to convert tables from SQL Server to Snowflake☆13Jun 27, 2019Updated 6 years ago
- The codes for the paper of "A particle swarm optimization-based flexible convolutional auto-encoder for image classification" published b…☆10Jul 21, 2020Updated 5 years ago
- Repo that will help you explore how to build a hybrid workflow using Apache Airflow and Amazon ECS Anywhere☆11Jul 12, 2022Updated 3 years ago
- This repository is a working ETL framework which utilizes user data from Spotify API using ➲Python for Extraction and Transformation ➲SQL…☆12Apr 16, 2023Updated 2 years ago
- Lightweight, open source, locally-hosted Modern Data Stack☆17Apr 7, 2025Updated 11 months ago
- Online Fashion Commerce Website☆11Nov 24, 2025Updated 3 months ago
- B19415 - The Definitive Guide to Data Integration☆11Apr 15, 2024Updated last year
- Simple python implementation of stochastic gradient descent for neural networks through backpropagation.☆12Dec 29, 2023Updated 2 years ago
- Extract GLCM, Region Properties, and Moments related features in a line of code and then get those into a data-frame.☆13Sep 15, 2020Updated 5 years ago
- ☆16Feb 17, 2026Updated 2 weeks ago
- create kubernetes cluster on AWS only typing 'terraform apply' on 3 minutes.☆16Jul 5, 2019Updated 6 years ago
- This repository contains an example of how to leverage Cloud Composer and Cloud Dataflow to move data from a Microsoft SQL Server to BigQ…☆19Jun 10, 2025Updated 9 months ago
- An end-to-end Twitter Data Pipeline that extracts data from Twitter and loads it into AWS S3.☆15Aug 26, 2023Updated 2 years ago
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆20Aug 12, 2025Updated 6 months ago
- Peakrs Dataframe is a library and framework facilitates the extraction, transformation, and loading (ETL) of data.☆18Oct 26, 2023Updated 2 years ago
- ☆21Oct 24, 2022Updated 3 years ago
- Data pipeline for uploading, preprocessing, and visualising COVID19 data☆18Apr 1, 2023Updated 2 years ago
- Highly efficient GLCM/X-GLCM feature extractor for python.☆20Aug 8, 2017Updated 8 years ago
- A guide to Ultralytics' mission, vision, values, and practices, providing key insights and resources for aligning with our goals.☆62Feb 18, 2026Updated 2 weeks ago
- ☆21Aug 8, 2024Updated last year
- PromethAI app☆29Mar 5, 2024Updated 2 years ago
- ☆21Nov 4, 2023Updated 2 years ago
- Data Engineering Capstone Project: ETL Pipelines and Data Warehouse Development☆21Jul 9, 2019Updated 6 years ago
- ELT With Airflow Helper - Classes and functions to make apache airflow life easier☆12Feb 27, 2026Updated last week
- Platform for Analysis and Labeling of Medical Time Series☆25Dec 19, 2020Updated 5 years ago
- Food for thoughts around data contracts☆32Jul 24, 2025Updated 7 months ago
- Course Material Data Engineering on AWS Course☆31Sep 9, 2024Updated last year
- A Flask auto importer that allows your Flask apps to grow big.☆23Feb 22, 2026Updated 2 weeks ago
- F1 Data Pipeline☆25Jul 1, 2023Updated 2 years ago
- Transaction processing & vis pipeline using PySpark Streaming☆30Jul 18, 2024Updated last year
- Data Engineering with Google Cloud Platform, published by Packt☆121Sep 20, 2023Updated 2 years ago
- Ultralytics LLM-related experiments☆83Jan 22, 2026Updated last month
- Flask Best Practices for Deployment | AppSeed☆32May 30, 2022Updated 3 years ago
- ☆38Aug 29, 2025Updated 6 months ago
- Department of Education (DOE) for New South Wales (AUS) data stack in a box☆36Nov 13, 2024Updated last year
- Arcane Insight is a data analytics project designed to harness the power of SQLMesh & DuckDB to collect, transform, and analyze data from…☆38Jan 23, 2025Updated last year
- ☆27Feb 2, 2018Updated 8 years ago
- A "modern" Strava data pipeline fueled by dlt, duckdb, dbt, and evidence.dev☆40May 11, 2025Updated 9 months ago
- A cheap, serverless version of Snowplow deployed with Terraform that runs on dumky.net☆43Feb 8, 2024Updated 2 years ago