A Python package for interacting with the MinerU Vision-Language Model.
☆108Feb 5, 2026Updated last month
Alternatives and similar repositories for mineru-vl-utils
Users that are interested in mineru-vl-utils are comparing it to the libraries listed below
Sorting:
- MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.☆24Dec 11, 2024Updated last year
- 阅读顺序、Layoutreader☆19May 8, 2025Updated 10 months ago
- DELT: Data Efficacy for Language Model Training☆43Feb 12, 2026Updated 3 weeks ago
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆458Sep 28, 2025Updated 5 months ago
- Large-Scale High-quality Chinese Web Text with Multi-dimensional and fine-grained information☆38Dec 2, 2024Updated last year
- ☆55Updated this week
- Diffusion Model Improvement Method☆35Sep 4, 2023Updated 2 years ago
- ☆18Feb 16, 2025Updated last year
- ☆23Dec 11, 2025Updated 2 months ago
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated 11 months ago
- This repository contains the code for the Transformer-Representation Neural Topic Model (TNTM) based on the paper "Probabilistic Topic Mo…☆12Jul 6, 2024Updated last year
- 记录有用的Git repos☆12Jul 28, 2024Updated last year
- ☆11Oct 31, 2024Updated last year
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆275Dec 6, 2025Updated 3 months ago
- Data browser based on s3. 一个基于 S3 的数据(json / jsonl / parquet / html / md等)可视化工具。👇 Try online.☆79Nov 11, 2025Updated 3 months ago
- This repository provides the code for applying Contrastive Learning Penalty Loss (CLPL) and Mixture of Experts (MoE) to the BGE-M3 text e…☆11Dec 27, 2024Updated last year
- A helper package to get information of scholarly articles from DBLP using its public API☆15May 13, 2025Updated 9 months ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- Implementation (in progress) of Dieng et al.'s TopicRNN intended to be used as a baseline and starting point.☆10Jun 26, 2018Updated 7 years ago
- Long Context Research☆29Jan 26, 2026Updated last month
- A mesh system for adapting multiple large language models.☆11Mar 20, 2024Updated last year
- 밑바닥부터 시작하는 딥러닝 2! 판교에서 진행중 <3☆12Aug 20, 2019Updated 6 years ago
- 工业级中文语音识别系统电子书☆13Oct 30, 2020Updated 5 years ago
- ☆18Jun 14, 2025Updated 8 months ago
- ☆11Aug 27, 2020Updated 5 years ago
- Code for the "Long Context Needs Some R&R" paper.☆12Mar 11, 2024Updated last year
- ☆13Apr 2, 2024Updated last year
- Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 202…☆30Jan 18, 2026Updated last month
- A local search system implementation using Elasticsearch for Wikipedia data indexing and retrieval.☆12May 17, 2025Updated 9 months ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Nov 28, 2024Updated last year
- This is some of my Python technical books collection☆13Sep 26, 2013Updated 12 years ago
- named entity recognition combined with rule from entity dict☆13Aug 25, 2020Updated 5 years ago
- ☆13Feb 14, 2024Updated 2 years ago
- conversion doc(pdf/html/doc/docx/ppt/pptx)to markdown☆48Jul 23, 2024Updated last year
- Data Set Description Language Specification (新一代人工智能数据集描述语言DSDL)☆46May 29, 2024Updated last year
- Chatbot_CN 项目的知识图谱模块☆12Mar 27, 2020Updated 5 years ago
- The official implement of CTRNet++.☆14Dec 30, 2024Updated last year
- WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。☆14Apr 18, 2024Updated last year
- baseline分享-互联 网新闻情感分析☆11Oct 12, 2019Updated 6 years ago