MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
☆218Mar 24, 2026Updated this week
Alternatives and similar repositories for MinerU-HTML
Users that are interested in MinerU-HTML are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding☆60Feb 10, 2026Updated last month
- ☆16Sep 4, 2025Updated 6 months ago
- A Python package for interacting with the MinerU Vision-Language Model.☆109Updated this week
- Data Set Description Language Specification (新一代人工智能数据集描述语言DSDL)☆46May 29, 2024Updated last year
- SDK of OpenDataLab - https://opendatalab.org.cn☆59Jul 31, 2025Updated 7 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆29May 13, 2024Updated last year
- 阅读顺序、Layoutreader☆19May 8, 2025Updated 10 months ago
- TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition☆29Feb 5, 2026Updated last month
- Pin files for contextual, codebase-level AI assistance.☆16Jul 11, 2024Updated last year
- MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.☆24Dec 11, 2024Updated last year
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆460Sep 28, 2025Updated 6 months ago
- Awesome Long-CoT Data☆19Mar 26, 2025Updated last year
- ☆33Jul 15, 2025Updated 8 months ago
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"☆34Jun 13, 2025Updated 9 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- GameStream client for iOS/tvOS☆14May 3, 2024Updated last year
- ☆13Oct 11, 2024Updated last year
- ☆34Jan 2, 2024Updated 2 years ago
- Adds alert blockquote support to VS Code's built-in markdown preview☆13Dec 2, 2023Updated 2 years ago
- This is the repo for CROssBARv2 Knowledge Graph data. CROssBARv2 is a heterogeneous general-purpose biomedical KG-based system.☆11Feb 4, 2026Updated last month
- ☆13Feb 20, 2026Updated last month
- 一款自动化写标书的后端代码,开源免费使用☆31Jun 11, 2025Updated 9 months ago
- ☆21Apr 9, 2025Updated 11 months ago
- Data annotation component library --provided as NPM packages☆147Mar 18, 2026Updated last week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A clean beamer/ltx-talk theme with a big title graphic☆21Mar 10, 2026Updated 2 weeks ago
- Tools for OpenDataArena: Fair, Open, and Transparent Arena for Data☆137Mar 15, 2026Updated 2 weeks ago
- Check SSL certificate status. 用 Python 检查网站的 SSL 证书有效期及颁发机构。☆17Mar 29, 2025Updated last year
- Data annotation toolbox supports image, audio and video data.☆1,524Mar 20, 2026Updated last week
- 《辐射小马国:粉色双眸》的重排版☆12Oct 11, 2019Updated 6 years ago
- ncnn is a high-performance neural network inference framework optimized for the mobile platform☆14May 20, 2022Updated 3 years ago
- REST API for Large Language Models using FastAPI, Redis and LiteLLM☆14Nov 30, 2023Updated 2 years ago
- [UNMAINTAINED] Nginx and PHP config for Mac OS X☆11Jan 28, 2013Updated 13 years ago
- ☆12Feb 16, 2023Updated 3 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Using OpenVINO to speed up inference of PaddleOCR-VL model☆26Updated this week
- Single Cell Pretrained Regulatory network INference from Transcripts☆11Sep 17, 2024Updated last year
- Advertising Skills for Open Claw, Claude Code & AI agents. Direct response, paid ads, funnels, and copy systems.☆372Updated this week
- Fast way to switch between Claude Code configuration profiles☆63Updated this week
- 武汉理工大学电费监控☆11Oct 5, 2022Updated 3 years ago
- A CLI tool and library written in Go for converting documents to Markdown format.☆24Sep 27, 2025Updated 6 months ago
- Laravel 5 Google Forms - Pull requests are welcome!☆12Jan 18, 2021Updated 5 years ago