opendatalab / MinerU-HTMLView on GitHub
MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
242Mar 27, 2026Updated last month

Alternatives and similar repositories for MinerU-HTML

Users that are interested in MinerU-HTML are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?