get-set-fetch / scraperLinks
Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom.
☆112Updated 2 years ago
Alternatives and similar repositories for scraper
Users that are interested in scraper are comparing it to the libraries listed below
Sorting:
- web scraping extension☆84Updated 4 months ago
- House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.☆128Updated this week
- Base Docker images for Apify actors.☆87Updated this week
- Browser automation API for repetitive web-based tasks, with a friendly user interface. You can use it to scrape content or do many other …☆31Updated 3 years ago
- Amazon products scraper with using of rotating proxies and headless Chrome from ScrapingAnt☆87Updated last year
- Web data extraction tool implemented as chrome extension with much more features☆46Updated 7 years ago
- Web data extraction tool implemented as chrome extension☆267Updated 2 months ago
- Email automation driven by headless chrome.☆168Updated 4 years ago
- Extracts email address from an arbitrary text input.☆64Updated 9 months ago
- Node.JS library and cli for scraping websites using Puppeteer (or not) and YAML definitions☆47Updated 2 years ago
- Evaluate JavaScript on a URL through headless Chrome browser.☆25Updated 4 years ago
- Automated functional testing via the Chrome DevTools Protocol. Easy to use and open source. Generates unique CSS and Xpath selectors. Out…☆58Updated 4 years ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.☆70Updated 4 years ago
- PixieBrix browser extension☆87Updated 11 months ago
- Cloud crawler functions for scrapeulous☆45Updated 4 years ago
- Chromium Browser Automation (extension for chrome browser automation).☆124Updated last year
- Chromium / Puppeteer site crawler☆48Updated 5 years ago
- Grammarify is a npm package that safely cleans up text that has mispellings, improper capitalization, lexical illusions, among other thin…☆73Updated 2 years ago
- An alternative to sticking that lovely web app into an <iframe> on a corp website☆50Updated 3 years ago
- Simple proxy rotation service☆30Updated 10 years ago
- A self-hosted dashboard and API to share service ports with the team.☆32Updated 2 years ago
- Amazon affiliate link storefront powered by a Reddit scraper☆14Updated 8 years ago
- Man in the middle using Playwright☆29Updated 2 years ago
- Parses OTP messages for a verification code and service provider.☆24Updated 2 years ago
- Util for generating random sentences, paragraphs and articles in English☆88Updated last year
- KeepLink is a simple bookmark service with tags and archive build with Supabase and Next.js. It doesn't have any social sharing featrue a…☆70Updated 2 years ago
- 🔥 Discover trending videos from Reddit and curated YouTube channels – Soon using Next.js. See `dev` branch☆15Updated this week
- You can use this act to monitor any page's content and get a notification when content changes.☆22Updated 3 years ago
- Web scraper using Cloudflare Workers☆26Updated 4 years ago
- A plugin for puppeteer-extra to add proxy support☆18Updated 2 years ago