bitmakerla / estelaLinks
estela, an elastic web scraping cluster πΈ
β189Updated this week
Alternatives and similar repositories for estela
Users that are interested in estela are comparing it to the libraries listed below
Sorting:
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pacβ¦β295Updated 5 months ago
- Scrapy rotation proxy package with advanced functionsβ95Updated 3 years ago
- The Web Scraping Club Free Repositoryβ151Updated 2 weeks ago
- Home of the Ulixee Open Data Platformβ55Updated last month
- Lego AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.β234Updated last year
- β142Updated last year
- Clean, filter and sample URLs to optimize data collection β Python & command-line β Deduplication, spam, content and language filtersβ149Updated 10 months ago
- Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of theβ¦β38Updated last year
- β77Updated 4 months ago
- Library that helps use puppeteer in scrapy.β52Updated 3 months ago
- Page Object pattern for Scrapyβ123Updated 2 weeks ago
- Scrapy Extension for monitoring spiders execution.β547Updated 6 months ago
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.β436Updated 2 years ago
- Zyte Automatic Extraction integration for Scrapyβ56Updated 3 years ago
- Web scraping Page Objects core libraryβ101Updated this week
- π Web scraping for humansβ957Updated 11 months ago
- Minimal set of tools to conduct stealthy scraping.β160Updated 2 years ago
- π·οΈ Scrapyd is an application for deploying and running Scrapy spiders.β87Updated 2 months ago
- Use AWS Lambda functions as a proxy pool to scrape web pages.β139Updated last year
- Article extraction benchmark: dataset and evaluation scriptsβ336Updated last month
- Spider templates for automatic crawlers.β32Updated last month
- Get structured JSON data from any page.β178Updated 2 years ago
- undetected chromedriver Dockerβ35Updated 2 years ago
- Scrapy project boilerplate done rightβ48Updated 8 months ago
- A python based HTML to text conversion library, command line client and Web service.β323Updated last week
- Parsing JavaScript objects into Python data structuresβ214Updated 2 months ago
- dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decoratorsβ428Updated 7 months ago
- Make sense of it all. Semantic data modeling and analytics with a sprinkle of AI. https://totalhack.github.io/zillion/β203Updated 5 months ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.β70Updated 4 years ago
- Fast and robust date extraction from web pages, with Python or on the command-lineβ141Updated 3 months ago