lorey/mlscraper

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lorey/mlscraper)

lorey / mlscraper

🤖 Scrape data from HTML websites automatically by just providing examples

☆1,387

Alternatives and similar repositories for mlscraper

Users that are interested in mlscraper are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

alirezamika / autoscraper
View on GitHub
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
☆7,657Jun 9, 2025Updated last year
Boris-code / feapder
View on GitHub
🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单，功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬…
☆3,723Jul 7, 2026Updated 2 weeks ago
adbar / trafilatura
View on GitHub
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…
☆6,318Updated this week
bee-san / pyWhat
View on GitHub
🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell …
☆7,272Oct 31, 2023Updated 2 years ago
ScrapeGraphAI / Scrapegraph-ai
View on GitHub
Python scraper based on AI
☆28,509Updated this week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
apify / crawlee
View on GitHub
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …
☆24,824Updated this week
crawlab-team / crawlab
View on GitHub
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架
☆12,246Feb 10, 2026Updated 5 months ago
roniemartinez / dude
View on GitHub
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
☆426Mar 16, 2025Updated last year
jjonescz / awe
View on GitHub
AI-based web extractor
☆12Feb 25, 2023Updated 3 years ago
MohamedHmini / iww
View on GitHub
AI based web-wrapper for web-content-extraction
☆102Feb 6, 2023Updated 3 years ago
q-m / scrapyd-k8s
View on GitHub
Scrapyd on container infrastructure
☆16May 29, 2026Updated last month
illacloud / illa-builder
View on GitHub
Low-code platform allows you to build business apps, enables you to quickly create internal tools such as dashboard, crud app, admin pane…
☆12,297May 27, 2026Updated last month
lixi5338619 / magical_spider
View on GitHub
神奇的蜘蛛🕷，一个几乎适用于所有web端站点的采集方案
☆349Aug 23, 2022Updated 3 years ago
GeneralNewsExtractor / GeneralNewsExtractor
View on GitHub
新闻网页正文通用抽取器 Beta 版.
☆3,788Apr 21, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
neuml / txtai
View on GitHub
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
☆12,733Updated this week
jamesturk / scrapeghost
View on GitHub
👻 Experimental library for scraping websites using OpenAI's GPT API.
☆1,443Jan 14, 2026Updated 6 months ago
multiprocessio / datastation
View on GitHub
App to easily query, script, and visualize data from every database, file, and API.
☆2,958Nov 10, 2023Updated 2 years ago
AtuboDad / playwright_stealth
View on GitHub
playwright stealth
☆975Jul 29, 2024Updated last year
Germey / AwesomeWebScraping
View on GitHub
List of libraries, tools and APIs for web scraping and data processing.
☆266Mar 12, 2026Updated 4 months ago
ArchiveBox / ArchiveBox
View on GitHub
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and mor…
☆27,989Updated this week
scrapedia / scrapy-pipelines
View on GitHub
A collection of pipelines for Scrapy
☆16Apr 27, 2026Updated 2 months ago
microsoft / playwright-python
View on GitHub
Python version of the Playwright testing and automation library.
☆14,838Updated this week
lorien / awesome-web-scraping
View on GitHub
List of libraries, tools and APIs for web scraping and data processing.
☆7,983Jul 12, 2026Updated last week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
GitHubDaily / GitHubDaily
View on GitHub
坚持分享 GitHub 上高质量、有趣实用的开源技术教程、开发者工具、编程网站、技术资讯。A list cool, interesting projects of GitHub.
☆47,134Dec 31, 2025Updated 6 months ago
mli / autocut
View on GitHub
用文本编辑器剪视频
☆7,764Oct 5, 2024Updated last year
Gerapy / GerapyAutoExtractor
View on GitHub
Auto Extractor Module
☆338Aug 19, 2024Updated last year
Vucko95 / Computer-Science-Notes-Only-Source-Code-
View on GitHub
This repo contains only source code for computer science course.
☆20Nov 1, 2020Updated 5 years ago
visualpython / visualpython
View on GitHub
GUI-based Python code generator for data science, extension to Jupyter Lab, Jupyter Notebook and Google Colab.
☆917Jul 3, 2024Updated 2 years ago
nocodb / nocodb
View on GitHub
🔥 🔥 🔥 A Free & Self-hostable Airtable Alternative
☆64,148Updated this week
Sanster / IOPaint
View on GitHub
Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powere…
☆23,323Apr 29, 2025Updated last year
AutomaApp / automa
View on GitHub
A browser extension for automating your browser by connecting blocks
☆21,490Mar 2, 2026Updated 4 months ago
crawlab-team / webspot
View on GitHub
An intelligent web service to automatically detect web content and extract information from it.
☆86Jul 13, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
reflex-dev / reflex
View on GitHub
🕸️ Web apps in pure Python 🐍
☆28,663Updated this week
tommyequaker8354 / yuzu-emulator-pc-nintendo-switch
View on GitHub
Yuzu Emulator PC Nintendo Switch (2026) is the premier utility for advanced hybrid console emulation. Experience high-speed 4K performanc…
☆38May 7, 2026Updated 2 months ago
codelucas / newspaper
View on GitHub
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
☆15,114Jul 8, 2026Updated last week
TheWebScrapingClub / webscraping-from-0-to-hero
View on GitHub
The web scraping open project repository aims to share knowledge and experiences about web scraping with Python
☆1,729May 27, 2024Updated 2 years ago
obsei / obsei
View on GitHub
Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand …
☆1,416Feb 4, 2026Updated 5 months ago
chriskiehl / Gooey
View on GitHub
Turn (almost) any Python command line program into a full GUI application with one line
☆21,907Mar 23, 2026Updated 3 months ago
jmriebold / BoilerPy3
View on GitHub
Python port of Boilerpipe library
☆96Aug 20, 2024Updated last year