stanzhai/Html2Article

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/stanzhai/Html2Article)

stanzhai / Html2Article

Html网页正文提取

☆496

Alternatives and similar repositories for Html2Article

Users that are interested in Html2Article are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rainyear / cix-extractor-py
View on GitHub
基于行块分布函数的通用网页正文（及图片）抽取 - Python版本
☆114Sep 22, 2016Updated 9 years ago
reorx / cx-extractor
View on GitHub
Automatically exported from code.google.com/p/cx-extractor
☆29Apr 1, 2015Updated 11 years ago
stanzhai / ScrapingSpider
View on GitHub
业余时间开发的，支持多线程，支持关键字过滤，支持正文内容智能识别的爬虫。
☆79Mar 26, 2013Updated 13 years ago
url2io / url2io-python-sdk
View on GitHub
⛔ [DEPRECATED] URL2io Python SDK，用于网页信息提取，如正文提取
☆41Dec 5, 2020Updated 5 years ago
MRLuowen / GrabContent
View on GitHub
基于行块抽取正文内容的java版本的改进算法
☆16Aug 20, 2014Updated 11 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ahkimkoo / arex
View on GitHub
node.js article extractor, automatic summarization.
☆31Dec 6, 2021Updated 4 years ago
hfut-dmic / ContentExtractor
View on GitHub
自动抽取网页正文的算法，用JAVA实现
☆111Apr 18, 2017Updated 9 years ago
srijiths / readabilityBUNDLE
View on GitHub
A bundle of html content extraction algorithms
☆121Mar 27, 2015Updated 11 years ago
intohole / sixgod
View on GitHub
正文提取｜extract content from html
☆22May 18, 2017Updated 9 years ago
heavysheep / webEYE
View on GitHub
对不同模板的静态网页，识别并提取正文、标题、时间等元素
☆15Dec 28, 2016Updated 9 years ago
url2io / url2io-app-samples
View on GitHub
App samples of using URL2io API；演示如何使用 URL2io API 来对网页进行正文提取
☆45Jul 18, 2024Updated 2 years ago
mylukin / Textractor
View on GitHub
一个高效的从HTML中提取正文的类库。An efficient class library for extracting text from HTML.
☆51May 17, 2017Updated 9 years ago
CrawlScript / WebCollector
View on GitHub
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup …
☆3,085Feb 10, 2026Updated 5 months ago
chrislinan / cx-extractor-python
View on GitHub
基于行块分布函数的通用网页正文抽取算法的Python版本实现，添加了英文支持/ Web page content extraction algorithm, support both Chinese and English
☆482Jul 9, 2019Updated 7 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
tianshiyeben / draw
View on GitHub
提取新闻内容页的标题，时间，正文，无需配置
☆17Aug 19, 2016Updated 9 years ago
amumu-dev / cx-extractor
View on GitHub
clone of https://code.google.com/p/cx-extractor
☆37Sep 26, 2013Updated 12 years ago
grangier / python-goose
View on GitHub
Html Content / Article Extractor, web scrapping lib in Python
☆4,101Mar 10, 2026Updated 4 months ago
RainmanJin / HTMLContentExtractor
View on GitHub
网页正文及正文图片提取，基于哈工大的《基于行块分布函数的通用网页正文抽取》算法
☆11Jan 22, 2016Updated 10 years ago
buriy / python-readability
View on GitHub
fast python port of arc90's readability tool, updated to match latest readability.js!
☆2,894Jan 26, 2026Updated 5 months ago
ximing / WeChatWallClient.NET
View on GitHub
微信上墙，.NET版本
☆12Jan 18, 2015Updated 11 years ago
GeneralNewsExtractor / GeneralNewsExtractor
View on GitHub
新闻网页正文通用抽取器 Beta 版.
☆3,788Apr 21, 2026Updated 3 months ago
Jeff-Klein / String.Extensions
View on GitHub
Provides custom methods to C# String type
☆13Jan 17, 2020Updated 6 years ago
kshoji / STLViewer
View on GitHub
STL Viewer app for Android
☆12Nov 10, 2018Updated 7 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
bfoz / stl-ruby
View on GitHub
Read and write STL files
☆14Jan 17, 2015Updated 11 years ago
enspiral-cherubi / three-stl-loader
View on GitHub
@aleeper's THREE.STLLoader repackaged as a node module
☆13Feb 21, 2018Updated 8 years ago
dotnetcore / DotnetSpider
View on GitHub
DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
☆4,144Apr 3, 2026Updated 3 months ago
timbertson / python-readability
View on GitHub
[abandoned] python port of arc90's readability bookmarklet
☆542Jun 16, 2011Updated 15 years ago
kkaefer / utf7
View on GitHub
Converts text to and from UTF-7 (RFC 2152 and IMAP).
☆13Nov 4, 2023Updated 2 years ago
ZhaoYis / Berry.Spider
View on GitHub
基于Selenium自动化框架实现的爬虫程序（目前主要有百度、头条、搜狗）
☆15Jul 5, 2026Updated 2 weeks ago
wenson / proxypool
View on GitHub
This project provides a http proxy pool for use when you want a http proxy server.
☆52Mar 7, 2014Updated 12 years ago
kn007 / Reduce-Shrink-Purge-the-ibdata1-file-in-MySQL
View on GitHub
This project could help to reduce the ibdata1 file size.
☆10Dec 31, 2017Updated 8 years ago
fxsjy / jparser
View on GitHub
A readability parser which can extract title, content, images from html pages
☆86May 29, 2020Updated 6 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
dotnet9 / CsharpSocketTest
View on GitHub
C# socket测试：对象二进制序列化研究、TCP/UDP网络传输、WPF\AvaloniaUI ListView\DataGrid大数据加载、刷新
☆14Feb 13, 2026Updated 5 months ago
anderscui / jieba.NET
View on GitHub
jieba中文分词的.NET版本（支持.NET Framework与.NET Core）
☆1,146Dec 8, 2022Updated 3 years ago
hailong0707-zz / spider_news_gov
View on GitHub
Scrapy Spider for 中国发展改革委员会
☆13Nov 17, 2014Updated 11 years ago
RabbitTeam / exceptionless-webhooks
View on GitHub
exceptionless webhook
☆26Nov 26, 2018Updated 7 years ago
msigut / FreeProxySharp
View on GitHub
.NET Core Proxy library based on HttpClient works with FreeProxyList.net
☆20Dec 8, 2022Updated 3 years ago
kingwkb / readability
View on GitHub
a python readability
☆277Jun 22, 2017Updated 9 years ago
karussell / snacktory
View on GitHub
Readability clone in Java
☆462Oct 13, 2020Updated 5 years ago