ELC / web-scraping-pipeline
This is a demo project to compare two web scrapping frameworks, Playwright and Selenium and using the new Pipelining tool Dagster
☆13Updated 3 years ago
Alternatives and similar repositories for web-scraping-pipeline:
Users that are interested in web-scraping-pipeline are comparing it to the libraries listed below
- Async bulk data ingestion and querying in various document, graph and vector databases via their Python clients☆36Updated last year
- Functional composable pipelines allowing clean separation of the business logic and its implementation☆11Updated 8 months ago
- YouTube Transcript Cleaner is a simple web-based application that improves the readability of YouTube transcripts.☆25Updated last year
- Awesome Orchest projects, both official and submitted by the community.☆25Updated last year
- a graph definition and execution library for python☆16Updated last year
- scrapper for various science databases☆11Updated last year
- A swarm of LLM agents that will help you test, document, and productionize your code!☆13Updated last week
- Plugin for LLM adding support for Google's PaLM 2 model☆14Updated last year
- Jim is a simple, beautiful Jupyter notebook editor for macOS☆29Updated last year
- Ssebowa is free and open source library in Python that provides generative-ai models.☆14Updated last year
- A collection of tools that can be used for LLM function calling☆32Updated 11 months ago
- A few end to end examples that use data-describe☆16Updated last year
- A collection of projects I did while at General Assembly Singapore - as part of Data Science Immersive☆11Updated 4 years ago
- Code that accompanies the PyData New York (2022) talk: Addressing the sensitivity of Large language models☆13Updated 2 years ago
- Prefect integrations for working with OpenAI.☆36Updated 9 months ago
- This repository auto-configures an Apache Pinot and Superset cluster for analyzing IRA tweets from FiveThirtyEight.☆11Updated 4 years ago
- A curated list of awesome open source tools and commercial products to catalog, version, and manage data 🚀☆30Updated 2 years ago
- Scripts and ideas to manage tons and tons of images and movies☆16Updated last week
- Web crawler for Burplist, a search engine for craft beers in Singapore☆14Updated this week
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆15Updated this week
- Batch processing using joblib including tqdm progress bars☆20Updated 3 years ago
- Generate beautiful, testable documentation with Jupyter Notebooks☆21Updated 2 years ago
- A python package for running directed acyclic graphs of asynchronous I/O operations☆16Updated 3 years ago
- Tools for encoding Magic: The Gathering cards into a form suitable for AI text generation☆19Updated 3 years ago
- Have UV deal with all your Jupyter deps.☆22Updated 5 months ago
- Maintain a FAISS index for specified Datasette tables☆35Updated 8 months ago
- SQL functions for calling OpenAI APIs☆21Updated 2 years ago
- Build complex types from simple blueprints with Pydantic☆24Updated last month
- Examples of vector DB indexing and query with various vector databases.☆12Updated last week