bitmakerla / estelaLinks
estela, an elastic web scraping cluster πΈ
β194Updated last month
Alternatives and similar repositories for estela
Users that are interested in estela are comparing it to the libraries listed below
Sorting:
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pacβ¦β298Updated 8 months ago
- Scrapy rotation proxy package with advanced functionsβ94Updated 3 years ago
- The Web Scraping Club Free Repositoryβ158Updated 2 months ago
- Use AWS Lambda functions as a proxy pool to scrape web pages.β139Updated 2 years ago
- Lego AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.β238Updated last year
- Library that helps use puppeteer in scrapy.β52Updated 5 months ago
- Zyte Automatic Extraction integration for Scrapyβ56Updated 3 years ago
- Spider templates for automatic crawlers.β34Updated 2 weeks ago
- Home of the Ulixee Open Data Platformβ56Updated 4 months ago
- Minimal set of tools to conduct stealthy scraping.β162Updated 2 years ago
- Clean, filter and sample URLs to optimize data collection β Python & command-line β Deduplication, spam, content and language filtersβ158Updated last month
- Page Object pattern for Scrapyβ125Updated this week
- Get structured JSON data from any page.β178Updated 2 years ago
- Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of theβ¦β38Updated last month
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.β436Updated 3 years ago
- β145Updated 2 years ago
- Web scraping Page Objects core libraryβ104Updated this week
- Make sense of it all. Semantic data modeling and analytics with a sprinkle of AI. https://totalhack.github.io/zillion/β206Updated 2 weeks ago
- BuildFlow, is an open source framework for building large scale systems using Python. All you need to do is describe where your input is β¦β198Updated 2 years ago
- Scrapyd on container infrastructureβ16Updated 9 months ago
- Parsing JavaScript objects into Python data structuresβ217Updated 5 months ago
- Scrapy project boilerplate done rightβ48Updated 11 months ago
- A python based HTML to text conversion library, command line client and Web service.β332Updated 2 months ago
- Piazza-Updater automates updates to a Weaviate database with real-time vectorial data. By continuously searching the internet and integraβ¦β32Updated last year
- β78Updated 7 months ago
- Scrapfly Python SDK for headless browsers and proxy rotationβ50Updated 2 weeks ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.β70Updated 4 years ago
- Zyte API integration for Scrapyβ39Updated this week
- Dockerized FastAPI wrapper around the recognize-anything image recognition modelsβ25Updated last year
- Confidence and Byt5 - based geotagging model predicting coordinates from text alone.β160Updated last year