szelenka / prefect-webscraper-example
Quick and dirty example of using Prefect Core to scrape a website
☆22Updated 4 years ago
Related projects: ⓘ
- A fully-featured multi-source data pipeline for continuously extracting knowledge from COVID-19 data.☆21Updated 3 years ago
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.☆51Updated 3 weeks ago
- A maximum-strength name parser for record linkage.☆29Updated last month
- Scraping Assisted by Learning☆35Updated last week
- This is the code accompanying the blog article on makeitnew.io. It defines a Prefect flow which can be visualized, run locally or registe…☆28Updated 4 years ago
- python package for performing deduplication using flexible text matching and cleaning in pandas dataframe☆25Updated 3 years ago
- Scrape various open data directories to create an index of what's available out there☆29Updated this week
- A python client library for the Stitch Import API☆42Updated 8 months ago
- Simple job postings scraper for Indeed based on requests and BeautifulSoup☆14Updated 2 years ago
- A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable fro…☆26Updated 2 years ago
- This repository is part of an article "Prefect workflow automation with Azure DevOps and AKS"☆31Updated 3 years ago
- ☆15Updated 2 weeks ago
- dagster scikit-learn pipeline example.☆43Updated last year
- Running Python Code in BigQuery UDFs☆23Updated 3 years ago
- A financial disclosure data extraction tool.☆13Updated last year
- Centralized whale instance using github actions, sourcing metadata from bigquery-public-data.☆17Updated 3 months ago
- A simple HTML table scraper made with Python and the amazing Streamlit!☆19Updated last year
- quadipy is a python package to help transform structured data into RDF graph format☆18Updated last year
- ☆15Updated last year
- Ibis analytics, with Ibis (and more!)☆19Updated this week
- A template repository with all the fundamentals needed to develop and deploy a Python data-processing routine for Prefect pipelines.☆20Updated 2 years ago
- A small Python module containing quick utility functions for standard ETL processes.☆33Updated this week
- A scraping Master-slave system based on Google App Engine☆10Updated 3 years ago
- Async bulk data ingestion and querying in various document, graph and vector databases via their Python clients☆32Updated 10 months ago
- Singer.io Tap for extracting data from the Google Analytics Reporting API☆11Updated this week
- Repo demonstrating a Dagster pipeline to generate Neo4j Graph☆21Updated 3 years ago
- ☆12Updated 10 months ago
- Datasette plugin providing instructions for exporting data to Jupyter or Observable☆12Updated last year
- ☆12Updated 4 years ago
- Pipeline definitions for managing data flows to power analytics at MIT Open Learning☆36Updated this week