☆124Jul 24, 2025Updated 10 months ago
Alternatives and similar repositories for microbatch-hourly-deduped-tutorial
Users that are interested in microbatch-hourly-deduped-tutorial are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repository goes over how to handle massive variety in data engineering☆321Jan 16, 2023Updated 3 years ago
- This repository helps teach people how to correctly define and create cumulative tables!☆764Oct 29, 2024Updated last year
- Hey this is the repo that has all the queries and data for my video game training series!☆160Jun 5, 2022Updated 3 years ago
- csv and flat-file sniffer built in Rust.☆45Jan 26, 2024Updated 2 years ago
- Example FastAPI app deployed to AWS with CDK.☆16Feb 23, 2023Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆13Dec 28, 2023Updated 2 years ago
- ☆14May 1, 2024Updated 2 years ago
- This is a public repository to go over all the LLM-driven data engineering concepts.☆1,152Oct 26, 2024Updated last year
- Code for my "Efficient Data Processing in SQL" book.☆62Aug 6, 2024Updated last year
- fst: flow state tool | smooth where you want it, friction where you need it when data engineering☆33Jun 13, 2023Updated 2 years ago
- Sample project to demonstrate data engineering best practices☆219Feb 24, 2024Updated 2 years ago
- Notas das aulas da Aceleração Dev #4 da DIO sobre Engenharia de Dados, ministrado pela Everis.☆13Feb 6, 2021Updated 5 years ago
- ☆23May 16, 2023Updated 3 years ago
- ☆11Dec 14, 2019Updated 6 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆10Nov 12, 2021Updated 4 years ago
- FIWARE 305: Real-time Processing of Context Data using Apache Flink☆11May 15, 2026Updated last week
- Spark implementation of Slowly Changing Dimension type 2☆11Jan 8, 2019Updated 7 years ago
- Local-first GitHub dashboard for maintainers to triage, review, and merge PRs and issues across repos without needing GitHub's built-in n…☆115Updated this week
- ☆17Nov 26, 2024Updated last year
- A data engineering personal project for applying some of my skills☆19Jul 11, 2021Updated 4 years ago
- This repository hosts materials for the Docker for Data Engineers workshop, offering hands-on exercises and resources tailored for data e…☆17May 23, 2024Updated 2 years ago
- This extension makes vscode seamlessly work with dbt and bigquery☆15Sep 27, 2022Updated 3 years ago
- A reference implementation of an end to end, open-source MLOps platform.☆15Nov 20, 2022Updated 3 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆18Dec 2, 2024Updated last year
- This is a repo with links to everything you'd ever want to learn about data engineering☆41,426Apr 2, 2026Updated last month
- Simple type converters: make ints, floats, bools and dates from your strings!☆11Jul 23, 2016Updated 9 years ago
- A template to preprocessing your golden dataset before to put your data in your best model☆17Jun 16, 2019Updated 6 years ago
- Using Selenium and Beautiful Soup to scrape marathon images☆10Feb 21, 2019Updated 7 years ago
- Companion repository for the "Streamlining AWS Glue CI/CD — A Comprehensive Blueprint" blog post☆11Nov 8, 2024Updated last year
- ⚡ Live demo environment for Django Templates fully rendered in the browser, with PyScript☆12Sep 21, 2022Updated 3 years ago
- Code & data to create the events sequence linked to the Di María's goal at the final of Copa América 2021☆12Dec 26, 2021Updated 4 years ago
- My final project for the Data Engineering Zoomcamp by DataTalksClub.☆10Apr 6, 2023Updated 3 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- My first attempt at a rough ETL pipeline; technologies include spark, GCS, prefect orchestration, and terraform☆14Oct 12, 2022Updated 3 years ago
- Data-aware orchestration with dagster, dbt, and airbyte☆31Jan 20, 2023Updated 3 years ago
- Visualizing American Time Use Data with D3.js☆16May 12, 2016Updated 10 years ago
- A minimalistic todo app.☆11Jun 23, 2022Updated 3 years ago
- This repo contains all code and data for WWCode Python DE workshop Aug 18 and 25 2022☆25Sep 17, 2022Updated 3 years ago
- This Chrome extension lets you summarize YouTube videos using the ChatGPT.☆17Dec 10, 2022Updated 3 years ago
- Spark (PySpark) script that applies dynamic time warping to Energy usage data (using the python fastdtw package)☆15Oct 22, 2016Updated 9 years ago