opendatalab / MinerU-HTMLView on GitHub
MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
218Mar 24, 2026Updated this week

Alternatives and similar repositories for MinerU-HTML

Users that are interested in MinerU-HTML are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?