Awesome list for datapipeline
☆35Feb 6, 2023Updated 3 years ago
Alternatives and similar repositories for awesome-data-pipeline
Users that are interested in awesome-data-pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- NoSQL extract, transform, load (ETL) toolkit with Python☆16Updated this week
- Spark Structured Streaming data pipeline that processes movie ratings data in real-time.☆14Apr 15, 2026Updated 2 weeks ago
- 🌟 An end-to-end full-stack data science project, including modelling, MLOps, and data storytelling. ✨☆16Aug 30, 2025Updated 7 months ago
- In this project I have built etl pipline which scraps the trending repository based on month,week and day LIVE extract other related info…☆12Sep 9, 2023Updated 2 years ago
- A data engineering project with Airflow, dbt, Terrafrom, GCP and much more!☆26Nov 8, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Scrape South African news☆12May 22, 2023Updated 2 years ago
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆20Aug 12, 2025Updated 8 months ago
- Codebase, data and models for the Headline Grouping paper at NAACL2021☆12Oct 2, 2022Updated 3 years ago
- A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Doc…☆23Nov 19, 2024Updated last year
- end-to-end information extraction pipeline built by LayoutLMV2, pretrained model from HuggingFace☆11Aug 15, 2023Updated 2 years ago
- Open source RAG with Llama Index for Japanese LLM in low resource settting☆10May 12, 2025Updated 11 months ago
- automated insights for tabular data☆10Feb 10, 2025Updated last year
- Python scripts to search for real estate on realtor.com and zillow.com☆14Nov 13, 2021Updated 4 years ago
- Newspaper Segmentation into images and text☆12Jan 11, 2019Updated 7 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Using Siamese LSTM to classify repeated quora questions. Attempted pretrained bert embeddings, Word2Vec and training own embeddings toget…☆10Aug 28, 2020Updated 5 years ago
- Triton backend for https://github.com/OpenNMT/CTranslate2☆11Aug 20, 2024Updated last year
- This repo contains the code for the tutorial for using the CrewAI agent framework to generate Sales Reports based on Salesforce data☆13Mar 16, 2024Updated 2 years ago
- repo of files pertaining to realtime, offline translations using whisper realtime and argos translate. This repo is marked Creative Commo…☆19May 20, 2025Updated 11 months ago
- Integrate Claude Code and Gemini CLI into your Obsidian workflow☆24Aug 21, 2025Updated 8 months ago
- ☆26Dec 18, 2020Updated 5 years ago
- This AI tool leverages different LLM services to generate product information from a given image. Simply upload an image of a product and…☆15Jun 25, 2024Updated last year
- A simple package of face detection☆14Nov 27, 2020Updated 5 years ago
- Summary and archive of Vatican .va (Holy See) ccTLD zone data for researchers.☆13Apr 26, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Highlights the current yank.☆12Jul 13, 2022Updated 3 years ago
- Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.☆29Aug 14, 2023Updated 2 years ago
- Simple ETL pipeline using Python☆29May 22, 2023Updated 2 years ago
- ☆12Apr 9, 2021Updated 5 years ago
- A minimal, configurable and highly optimized markdown2html compiler, supports macros, watch mode, syntax highlighting, latex math and liv…☆14Aug 10, 2023Updated 2 years ago
- Various stuff and tweaks I have around Obsidian☆12Jun 20, 2025Updated 10 months ago
- ☆15Nov 28, 2023Updated 2 years ago
- A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apa…☆29Jun 7, 2023Updated 2 years ago
- Contains code for C3D, LCN and TSM for action recognition models.☆10May 31, 2020Updated 5 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Short guide on how to connect to Termux SSH from anywhere while using TailScale as connection link.☆13Aug 30, 2021Updated 4 years ago
- ☆10Feb 3, 2020Updated 6 years ago
- Accompanies Finastra's Hack to the Future 4 Learning Session "Sustainability reports & NLP"☆10Mar 17, 2022Updated 4 years ago
- Synchronize properties from your Obsidian notes with a Markwhen timeline file.☆12Sep 20, 2025Updated 7 months ago
- ☆13Aug 20, 2021Updated 4 years ago
- The goal of this project is to illustrate Extract Transform Load (ETL) using Python and SQL. ETL is a process commonly done in computing,…☆34Sep 7, 2021Updated 4 years ago
- Audio Classification with machine learning☆18Apr 13, 2026Updated 2 weeks ago