MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
☆260Mar 27, 2026Updated 2 months ago
Alternatives and similar repositories for MinerU-HTML
Users that are interested in MinerU-HTML are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding☆70Updated this week
- ☆16Sep 4, 2025Updated 9 months ago
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆45Sep 11, 2024Updated last year
- A Python package for interacting with the MinerU Vision-Language Model.☆131Jun 11, 2026Updated last week
- SDK of OpenDataLab - https://opendatalab.org.cn☆60Jul 31, 2025Updated 10 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆30May 13, 2024Updated 2 years ago
- 阅读顺序、Layoutreader☆18May 8, 2025Updated last year
- [ICLR 2025] This is the official implementation for the paper: "Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluat…☆45Jun 11, 2025Updated last year
- 测试 https://huggingface.co/OFA-Sys/gsm8k-rft-llama7b-u13b 的 GSM8K 分数☆15Aug 10, 2023Updated 2 years ago
- Pin files for contextual, codebase-level AI assistance.☆16Jul 11, 2024Updated last year
- (CVPR 2026) TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition☆34Feb 5, 2026Updated 4 months ago
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆484Sep 28, 2025Updated 8 months ago
- This is the official project of paper: Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conver…☆25Nov 18, 2024Updated last year
- This is the repo for the paper Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining.☆48Aug 22, 2025Updated 9 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"☆34Jun 13, 2025Updated last year
- ☆13Jun 17, 2021Updated 5 years ago
- 'Live in Night City' is a Cyberpunk 2077 Mod powered by Cyber Engine Tweaks.☆10Oct 21, 2023Updated 2 years ago
- F4SE plugin code for Garbage Collector Bug Fix (GCBugFix).☆11Jul 18, 2024Updated last year
- This is the repo for CROssBARv2 Knowledge Graph data. CROssBARv2 is a heterogeneous general-purpose biomedical KG-based system.☆14Feb 4, 2026Updated 4 months ago
- ☆31Dec 6, 2024Updated last year
- Open-source multimodal data annotation platform with AI auto-annotation support.☆1,590Jun 8, 2026Updated last week
- Using GPT to parse PDF☆101Sep 6, 2024Updated last year
- Insurance AI Assistant A smart system combining PostgreSQL, Milvus, and specialized AI agents (Life/Home/Auto) to answer insurance querie…☆30Apr 29, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆16Dec 16, 2024Updated last year
- ☆20Aug 11, 2025Updated 10 months ago
- ☆22Apr 9, 2025Updated last year
- Tools for OpenDataArena: Fair, Open, and Transparent Arena for Data☆144Mar 15, 2026Updated 3 months ago
- 《辐射小马国:粉色双眸》的重排版☆12Oct 11, 2019Updated 6 years ago
- REST API for Large Language Models using FastAPI, Redis and LiteLLM☆14Nov 30, 2023Updated 2 years ago
- ☆26Apr 21, 2026Updated last month
- 🎮 MQTT Client Extension for Playnite☆15Mar 29, 2024Updated 2 years ago
- Automagical Rust binding to RED4ext☆18Jun 11, 2026Updated last week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- SECOM: On Memory Construction and Retrieval for Personalized Conversational Agents, ICLR 2025☆58Mar 1, 2025Updated last year
- Single Cell Pretrained Regulatory network INference from Transcripts☆11Sep 17, 2024Updated last year
- High-performance Qwen3-TTS implementation | Instruction-driven · Zero-shot voice cloning · Streaming · RTF 0.55☆64Apr 4, 2026Updated 2 months ago
- This is a read-only mirror of the CRAN R package repository. GOplot — Visualization of Functional Analysis Data. Homepage: https://gith…☆15Mar 30, 2016Updated 10 years ago
- Template for creating a BioCypher-driven knowledge graph☆13Jan 15, 2026Updated 5 months ago
- A browser-based generative text art engine that takes string of text and transforms them into typographic doodles.☆37Mar 1, 2026Updated 3 months ago
- ☆61Jun 15, 2025Updated last year