wanghaisheng / awesome-web-data-extractorLinks
A curated list of promising Web Data Extractors resources
☆28Updated 5 years ago
Alternatives and similar repositories for awesome-web-data-extractor
Users that are interested in awesome-web-data-extractor are comparing it to the libraries listed below
Sorting:
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆34Updated 2 years ago
- ☆29Updated 4 years ago
- Aiohttp web server API, which scrapes Google and returns scrape results as response. Supports proxies, multiple geos and number of result…☆56Updated last year
- Detect and classify pagination links☆103Updated 4 years ago
- Taking Normal Text as Input and Generating SQL commands using the OpenAI's GPT-3☆15Updated 4 years ago
- A complimentary proxy to help to use SPM with headless browsers☆108Updated 2 years ago
- List of free and checked http, https, socks4 and socks5 proxies☆11Updated this week
- Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai☆40Updated 2 years ago
- Neural Elastic Inference and Search☆19Updated 5 years ago
- Library that helps use puppeteer in scrapy.☆52Updated last month
- URL Inspection Tool Automator☆24Updated 2 years ago
- LinkRun - Data Engineering project done in 3 weeks during the Insight fellowship☆38Updated 5 years ago
- Dockerfile and web server for running GPT-J-6B on AWS GPU instances☆18Updated 3 years ago
- Zyte Automatic Extraction integration for Scrapy☆56Updated 3 years ago
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 6 years ago
- Extract text from HTML☆135Updated 4 years ago
- Waffer-thin FlaskGPT on Vercel.☆12Updated 2 years ago
- This project experiments with the Google NLP Algorithm to evaluate e-commerce product descriptions from an SEO perspective.☆17Updated 4 years ago
- ☆22Updated 7 months ago
- Common Crawl Index Server☆68Updated 3 months ago
- PostHog with text analytics extensions, serving as an advanced LLM analytics platform.☆12Updated 8 months ago
- Demo example of consumer goods categorization☆28Updated last year
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆56Updated last year
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 4 years ago
- AI based web-wrapper for web-content-extraction☆100Updated 2 years ago
- Application configuration and scripts for search on https://docs.vespa.ai/☆12Updated this week
- A Google Trends Analytics Package☆13Updated last year
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆18Updated last year
- Sentence Embedding as a Service☆15Updated last year
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆37Updated 2 months ago