ELC / web-scraping-pipelineLinks
This is a demo project to compare two web scrapping frameworks, Playwright and Selenium and using the new Pipelining tool Dagster
☆16Updated 3 years ago
Alternatives and similar repositories for web-scraping-pipeline
Users that are interested in web-scraping-pipeline are comparing it to the libraries listed below
Sorting:
- Async bulk data ingestion and querying in various document, graph and vector databases via their Python clients☆37Updated last year
- All Saleor services started from a single repository with Ansible, Terraform, and Kubernetes.☆20Updated 4 years ago
- Python SDK for Permit.io: Plug & Play Application Level Authorization☆14Updated 4 months ago
- ☆11Updated last year
- Python bindings for Upwork API (OAuth2)☆44Updated 8 months ago
- The Upwork Screener App is a tool for searching and filtering job listings on Upwork. It allows users to easily find jobs that match thei…☆11Updated 2 years ago
- Python wrapper for Ferret☆42Updated 3 years ago
- YouTube Transcript Cleaner is a simple web-based application that improves the readability of YouTube transcripts.☆27Updated 5 months ago
- Stella - A scalable platform for creating and managing AI agents☆23Updated last year
- POC integration Airbyte+Dagster+Langchain☆13Updated 2 years ago
- A Datasette plugin that adds UI elements to edit, insert, or delete rows in SQLite tables☆21Updated 7 months ago
- Collect/process data via various data sources : website / js website / API. Run scrapping pipeline via Celery, and Travis cron task. Du…☆13Updated last year
- TextGraphs + LLMs + graph ML for entity extraction, linking, ranking, and constructing a lemma graph☆25Updated last year
- Geniusrise: Framework for building geniuses☆60Updated last year
- Singer.io Tap for extracting data from the Google Analytics Reporting API☆12Updated 2 weeks ago
- Plugin for LLM adding support for Google's PaLM 2 model☆14Updated last year
- Powered by SideGuide and GPT-3☆12Updated 2 years ago
- Summarize a URL with GPT-3 Completions☆26Updated last month
- Repo demonstrating a Dagster pipeline to generate Neo4j Graph☆22Updated 4 years ago
- A Selenium webscraper for Etsy that takes search terms and the number of pages you want scraped as inputs, and returns pertinent details …☆27Updated 4 years ago
- The Selenium scraper that collected a million stories from Medium.com☆80Updated 6 years ago
- Contextual Multi-Armed Bandit Platform for Scoring, Ranking & Decisions☆22Updated 2 years ago
- A curated list of dagster code snippets for data engineers☆57Updated last year
- Command Line Interface that allows you interact with hightouch resources☆13Updated 3 years ago
- Scrapfly Python SDK for headless browsers and proxy rotation☆47Updated 3 months ago
- ☆12Updated last year
- Tasks as HTTP endpoints☆27Updated last year
- Index and search your personal data quickly and privately.☆28Updated 3 years ago
- SQL functions for calling OpenAI APIs☆22Updated 2 years ago
- Awesome Orchest projects, both official and submitted by the community.☆25Updated last year