ELC / web-scraping-pipeline
This is a demo project to compare two web scrapping frameworks, Playwright and Selenium and using the new Pipelining tool Dagster
☆13Updated 3 years ago
Alternatives and similar repositories for web-scraping-pipeline:
Users that are interested in web-scraping-pipeline are comparing it to the libraries listed below
- Async bulk data ingestion and querying in various document, graph and vector databases via their Python clients☆36Updated last year
- ☆11Updated 2 months ago
- Datasette plugin for authenticating access using API tokens☆11Updated 6 months ago
- Functional composable pipelines allowing clean separation of the business logic and its implementation☆11Updated 10 months ago
- Maintain a FAISS index for specified Datasette tables☆36Updated 9 months ago
- scrapper for various science databases☆11Updated last year
- Awesome Orchest projects, both official and submitted by the community.☆25Updated last year
- ☆11Updated 2 months ago
- A curated list of awesome open source tools and commercial products to catalog, version, and manage data 🚀☆32Updated 2 years ago
- Statistical visualizations for Datasette using Seaborn☆12Updated 3 years ago
- Prefect integrations for working with OpenAI.☆35Updated 11 months ago
- SQL functions for calling OpenAI APIs☆21Updated 2 years ago
- A text-to-SQL prototype on the northwind sqlite dataset☆12Updated 6 months ago
- Plugin for LLM adding support for Google's PaLM 2 model☆14Updated last year
- Package Manager is a JupyterLab extension that simplifies managing Python packages directly within your notebooks☆14Updated last month
- ☆9Updated 2 months ago
- A collection of projects I did while at General Assembly Singapore - as part of Data Science Immersive☆11Updated 4 years ago
- A few end to end examples that use data-describe☆16Updated last year
- 🛠 Self-hosted, fast, and consistent remote configuration for apps.☆14Updated 2 years ago
- Scrape various open data directories to create an index of what's available out there☆36Updated last month
- Tools for building SQLite databases from files and directories☆12Updated last year
- InGen is a command line tool written on top of pandas and great_expectations to perform small scale data transformations and validations …☆14Updated 3 months ago
- Visualisation of browsing history patterns using pandas and seaborn☆10Updated 4 years ago
- Orchest quickstart pipeline☆18Updated 2 years ago
- Learn Kubeflow with Arrikto☆15Updated 3 years ago
- JupyterLite as a Datasette plugin☆11Updated 3 years ago
- Credit Score Provider for the Faker Python Project. Use this to generate fake but realistic-looking consumer credit scores aligning to th…☆18Updated 5 months ago
- ☆12Updated last year
- Concatenated documentation for use with LLMs☆17Updated last month
- Data exchange and persistence based on human-readable files☆22Updated 3 months ago