ELC / web-scraping-pipeline
This is a demo project to compare two web scrapping frameworks, Playwright and Selenium and using the new Pipelining tool Dagster
☆13Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for web-scraping-pipeline
- Python context manager to communicate with a subprocess using iterables: for when data is too big to fit in memory and has to be streamed☆9Updated last month
- Datasette plugin for authenticating access using API tokens☆12Updated 2 months ago
- Functional composable pipelines allowing clean separation of the business logic and its implementation☆11Updated 5 months ago
- A curated list of awesome open source tools and commercial products to catalog, version, and manage data 🚀☆27Updated 2 years ago
- Examples of vector DB indexing and query with various vector databases.☆12Updated last month
- Object detection inference with Roboflow Train models on NVIDIA Jetson devices.☆13Updated last year
- Support files exposing JSON from the JSON Schema specifications to Python☆11Updated this week
- Async bulk data ingestion and querying in various document, graph and vector databases via their Python clients☆33Updated last year
- A few end to end examples that use data-describe☆16Updated last year
- SQL functions for calling OpenAI APIs☆21Updated last year
- Apache Spark based framework for analysis A/B experiments☆11Updated 2 weeks ago
- Common Paper Service Level Agreement☆13Updated 7 months ago
- Ssebowa is free and open source library in Python that provides generative-ai models.☆14Updated 9 months ago
- Have UV deal with all your Jupyter deps.☆18Updated 2 months ago
- Scripts and ideas to manage tons and tons of images and movies☆16Updated this week
- Build a directory full of files into a SQLite database☆13Updated 10 months ago
- Prefect integrations for working with OpenAI.☆36Updated 6 months ago
- a graph definition and execution library for python☆16Updated last year
- KML utilities for the ElementTree API☆19Updated 2 years ago
- A Python package that simplifies the use of secrets in a Jupyter notebook☆21Updated 3 years ago
- Triptych for data exchange and persistence☆23Updated 8 months ago
- Orchest quickstart pipeline☆17Updated 2 years ago
- Code that accompanies the PyData New York (2022) talk: Addressing the sensitivity of Large language models☆12Updated 2 years ago
- JupyterLite as a Datasette plugin☆11Updated 3 years ago
- 🛠 Self-hosted, fast, and consistent remote configuration for apps.☆12Updated 2 years ago
- 🛷 cool task runner☆11Updated last month
- Tool to take your ML model from local to production with one-line of code.☆23Updated 10 months ago
- Build interactive big data apps with Altair and Vega easily using Panel + VegaFusion.☆17Updated 2 years ago
- Plugin for Intake to read from SQL servers☆15Updated last year