MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
☆212Dec 25, 2025Updated 2 months ago
Alternatives and similar repositories for MinerU-HTML
Users that are interested in MinerU-HTML are comparing it to the libraries listed below
Sorting:
- 阅读顺序、Layoutreader☆19May 8, 2025Updated 10 months ago
- Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding☆58Feb 10, 2026Updated last month
- The world’s first science-focused human-AI Agent collaborative discussion community.☆45Updated this week
- ☆34Jan 2, 2024Updated 2 years ago
- ☆13Feb 20, 2026Updated 2 weeks ago
- You can using Diffree in ComfyUI☆31Mar 9, 2025Updated last year
- MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.☆24Dec 11, 2024Updated last year
- A Python package for interacting with the MinerU Vision-Language Model.☆108Feb 5, 2026Updated last month
- A clean beamer/ltx-talk theme with a big title graphic☆20Mar 2, 2026Updated last week
- This is the repo for CROssBARv2 Knowledge Graph data. CROssBARv2 is a heterogeneous general-purpose biomedical KG-based system.☆11Feb 4, 2026Updated last month
- This repository defines a python class that can be used to load data for the tf.keras.model.fit_generator function by using a torch.utils…☆11Oct 26, 2024Updated last year
- ☆10Oct 18, 2021Updated 4 years ago
- Telegram bot framework written in PHP for OpenWRT☆12Nov 27, 2022Updated 3 years ago
- Talk directly to your data☆10Jul 18, 2023Updated 2 years ago
- The first French corpus comprising financial reports☆13Jun 23, 2020Updated 5 years ago
- SSL/TLS Workshop/Reference Guide☆10May 2, 2018Updated 7 years ago
- WIP; specification of a format for communicating streams of HTTP requests and responses☆14Mar 29, 2024Updated last year
- audio, NLP, ML with huggingface, nvidia/nemo, speechbrain☆11Sep 4, 2023Updated 2 years ago
- ☆30Jan 8, 2026Updated 2 months ago
- xLSTMAD - Powerful xLSTM based Method for Anomaly Detection☆15Mar 1, 2026Updated last week
- A collection of various discourse segmenters☆10Jun 30, 2017Updated 8 years ago
- Go implementation of the Retro framework☆14Feb 14, 2023Updated 3 years ago
- A minimal implementation of spotify/annoy in pure rust☆11Mar 2, 2023Updated 3 years ago
- ☆12Jan 27, 2026Updated last month
- Using Demucs in comfyUI, make Music Source Separation☆10Dec 12, 2025Updated 2 months ago
- ☆13Aug 1, 2023Updated 2 years ago
- Indonesian law dataset containing section annotation of court decision documents☆17Jul 7, 2022Updated 3 years ago
- ☆16Sep 4, 2025Updated 6 months ago
- Transforms trx file into html☆10Jun 29, 2021Updated 4 years ago
- A simple code generator of JSON marshaler for go and tinygo.☆10Feb 9, 2026Updated last month
- A very easy-to-use wrapper of Duktape JavaScript engine, including wrappers for C, Go and Java. The bridge wrapper is also supporting mo…☆14Dec 20, 2021Updated 4 years ago
- An email segmentation system (reference implementation of ECIR 2018 paper)☆10Oct 21, 2019Updated 6 years ago
- a demo enviroment that creates a githup repo, a TF Workspace and a Vault namespace all integrated.☆11Mar 14, 2024Updated last year
- ☆11Dec 26, 2022Updated 3 years ago
- This is a warehouse for semantic segmentation models, can be used to train your image-datasets for segmentation tasks.☆14Feb 18, 2025Updated last year
- miaoshouai-assistant for webui-forge☆15Aug 15, 2024Updated last year
- Adds alert blockquote support to VS Code's built-in markdown preview☆13Dec 2, 2023Updated 2 years ago
- A platform for collecting, analyzing, and visualizing social media data.☆13Dec 27, 2020Updated 5 years ago
- A text analysis library for relevance and subtheme detection☆16Sep 22, 2025Updated 5 months ago