ELC / web-scraping-pipeline
This is a demo project to compare two web scrapping frameworks, Playwright and Selenium and using the new Pipelining tool Dagster
☆13Updated 3 years ago
Alternatives and similar repositories for web-scraping-pipeline:
Users that are interested in web-scraping-pipeline are comparing it to the libraries listed below
- scrapper for various science databases☆11Updated last year
- Async bulk data ingestion and querying in various document, graph and vector databases via their Python clients☆36Updated last year
- Orchest quickstart pipeline☆18Updated 2 years ago
- Functional composable pipelines allowing clean separation of the business logic and its implementation☆11Updated 8 months ago
- A curated list of awesome open source tools and commercial products to catalog, version, and manage data 🚀☆29Updated 2 years ago
- A collection of projects I did while at General Assembly Singapore - as part of Data Science Immersive☆11Updated 4 years ago
- Have UV deal with all your Jupyter deps.☆22Updated 4 months ago
- Code that accompanies the PyData New York (2022) talk: Addressing the sensitivity of Large language models☆12Updated 2 years ago
- A few end to end examples that use data-describe☆16Updated last year
- Examples of vector DB indexing and query with various vector databases.☆12Updated 3 months ago
- YouTube Transcript Cleaner is a simple web-based application that improves the readability of YouTube transcripts.☆25Updated last year
- Awesome Orchest projects, both official and submitted by the community.☆25Updated last year
- This repository auto-configures an Apache Pinot and Superset cluster for analyzing IRA tweets from FiveThirtyEight.☆11Updated 4 years ago
- Code repository for Liquid Time-stochasticity networks (LTSs)☆21Updated last year
- LLM plugin for models hosted by Anyscale Endpoints☆32Updated 9 months ago
- Prefect integrations for working with OpenAI.☆36Updated 9 months ago
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆16Updated this week
- Data exchange and persistence based on human-readable files☆22Updated last month
- ☆10Updated this week
- Wave Partial Differential Equation Solver in Python☆12Updated 7 months ago
- GraphRag vs Embeddings☆13Updated 6 months ago
- Postgres extensions to support end-to-end Retrieval-Augmented Generation (RAG) pipelines☆36Updated this week
- Evolutionary Search for expert-level performance on any task with environmental feedback☆14Updated 11 months ago
- LLM plugin providing access to the LLM documentation☆16Updated 2 months ago
- ☆23Updated last month
- An awesome list of longevity resources for living longer, healthier lives.☆29Updated last year
- SQL functions for calling OpenAI APIs☆21Updated 2 years ago
- efficient query encoding for dense retrieval☆11Updated 5 months ago
- LLM access to models by Anthropic, including the Claude series☆14Updated last month
- A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable fro…☆27Updated 2 years ago