opendatalab / MinerU-HTML
View external linksLinks

MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
202Dec 25, 2025Updated last month

Alternatives and similar repositories for MinerU-HTML

Users that are interested in MinerU-HTML are comparing it to the libraries listed below

Sorting:

Are these results useful?