ELC / web-scraping-pipeline
This is a demo project to compare two web scrapping frameworks, Playwright and Selenium and using the new Pipelining tool Dagster
☆13Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for web-scraping-pipeline
- A few end to end examples that use data-describe☆16Updated last year
- Functional composable pipelines allowing clean separation of the business logic and its implementation☆11Updated 5 months ago
- Have UV deal with all your Jupyter deps.☆18Updated 2 months ago
- Apache Spark based framework for analysis A/B experiments☆11Updated this week
- a graph definition and execution library for python☆16Updated last year
- A curated list of awesome open source tools and commercial products to catalog, version, and manage data 🚀☆25Updated 2 years ago
- SQL functions for calling OpenAI APIs☆21Updated last year
- scraping and querying documents for LLMs☆13Updated this week
- Triptych for data exchange and persistence☆22Updated 7 months ago
- Yet Another Web Extraction SDK☆14Updated this week
- scrapper for various science databases☆11Updated last year
- Awesome Orchest projects, both official and submitted by the community.☆25Updated last year
- Code that accompanies the PyData New York (2022) talk: Addressing the sensitivity of Large language models☆12Updated 2 years ago
- This repository auto-configures an Apache Pinot and Superset cluster for analyzing IRA tweets from FiveThirtyEight.☆11Updated 4 years ago
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆15Updated last week
- Tool to take your ML model from local to production with one-line of code.☆23Updated 9 months ago
- Python context manager to communicate with a subprocess using iterables: for when data is too big to fit in memory and has to be streamed☆9Updated last month
- Datasette plugin for authenticating access using API tokens☆12Updated 2 months ago
- Graph-and-node based workflows☆12Updated this week
- Statistical visualizations for Datasette using Seaborn☆11Updated 2 years ago
- Orchest quickstart pipeline☆17Updated 2 years ago
- Singer.io Tap for extracting data from the Google Analytics Reporting API☆11Updated this week
- Maintain a FAISS index for specified Datasette tables☆34Updated 4 months ago
- Scripts and ideas to manage tons and tons of images and movies☆15Updated last week
- A python package for running directed acyclic graphs of asynchronous I/O operations☆15Updated 3 years ago
- Prefect integrations for working with OpenAI.☆36Updated 6 months ago
- ☆29Updated 10 months ago
- Ssebowa is free and open source library in Python that provides generative-ai models.☆14Updated 9 months ago
- 🛠 Self-hosted, fast, and consistent remote configuration for apps.☆12Updated 2 years ago