bitmakerla / estelaLinks
estela, an elastic web scraping cluster πΈ
β188Updated last month
Alternatives and similar repositories for estela
Users that are interested in estela are comparing it to the libraries listed below
Sorting:
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pacβ¦β294Updated 4 months ago
- Scrapy rotation proxy package with advanced functionsβ95Updated 3 years ago
- The Web Scraping Club Free Repositoryβ151Updated 5 months ago
- Clean, filter and sample URLs to optimize data collection β Python & command-line β Deduplication, spam, content and language filtersβ148Updated 9 months ago
- Minimal set of tools to conduct stealthy scraping.β160Updated 2 years ago
- Lego AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.β235Updated last year
- Web scraping Page Objects core libraryβ101Updated last week
- A python based HTML to text conversion library, command line client and Web service.β323Updated 2 months ago
- Fast and robust date extraction from web pages, with Python or on the command-lineβ141Updated 2 months ago
- Library that helps use puppeteer in scrapy.β52Updated 2 months ago
- Page Object pattern for Scrapyβ121Updated 2 weeks ago
- β142Updated last year
- Home of the Ulixee Open Data Platformβ55Updated last month
- Scrapy Extension for monitoring spiders execution.β546Updated 6 months ago
- Zyte Automatic Extraction integration for Scrapyβ56Updated 3 years ago
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.β432Updated 2 years ago
- Article extraction benchmark: dataset and evaluation scriptsβ331Updated 2 weeks ago
- Scrapy project boilerplate done rightβ48Updated 8 months ago
- Use AWS Lambda functions as a proxy pool to scrape web pages.β137Updated last year
- Spider templates for automatic crawlers.β32Updated 2 weeks ago
- β77Updated 3 months ago
- Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of theβ¦β38Updated last year
- Parsing JavaScript objects into Python data structuresβ214Updated 2 months ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.β70Updated 4 years ago
- Detect and classify pagination linksβ103Updated this week
- Comprehensive wrapper and execution manager for the Chrome browser using the Chrome Debugging Protocol.β227Updated 4 months ago
- Scrapfly Python SDK for headless browsers and proxy rotationβ47Updated last month
- dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decoratorsβ428Updated 6 months ago
- playwright stealthβ811Updated last year
- π Intelligent browser header & fingerprint generatorβ753Updated 6 months ago