opendatalab / MinerU-HTMLView on GitHub
MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
212Dec 25, 2025Updated 2 months ago

Alternatives and similar repositories for MinerU-HTML

Users that are interested in MinerU-HTML are comparing it to the libraries listed below

Sorting:

Are these results useful?