SWHL/ChineseDocumentPDF

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SWHL/ChineseDocumentPDF)

SWHL / ChineseDocumentPDF

中文论文、证券类、财报类PDF数据

☆41

Alternatives and similar repositories for ChineseDocumentPDF

Users that are interested in ChineseDocumentPDF are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

360AILAB-NLP / 360LayoutAnalysis
View on GitHub
360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute
☆305Sep 10, 2024Updated last year
SWHL / TrOCR-Formula-Rec
View on GitHub
基于TrOCR + UniMER-1M数据集，训练一个小而美的公式识别模型
☆30Mar 17, 2026Updated 4 months ago
AILab-UniFI / cte-dataset
View on GitHub
CTE: Contextualized Table Extraction Dataset
☆17Feb 23, 2023Updated 3 years ago
ArchieAlexArkhipov / Cycle-CenterNet
View on GitHub
CycleCenternet based on MMDetection
☆22Jun 28, 2023Updated 3 years ago
ali-vilab / CAPability
View on GitHub
What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
☆28May 16, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Form2Seq-Data / Dataset
View on GitHub
Dataset corresponding to the paper: "Form2Seq : A Framework for Higher-Order Form Structure Extraction"
☆10Feb 17, 2021Updated 5 years ago
jiangnanboy / onnx-java
View on GitHub
onnx-java，这里利用java加载onnx模型，并进行推理。
☆21May 19, 2022Updated 4 years ago
daniel89710 / trt-depth-anything
View on GitHub
TensorRT depth-anything for anyone and anywhere
☆16Jan 29, 2024Updated 2 years ago
hiroi-sora / GapTree_Sort_Algorithm
View on GitHub
【间隙·树·排序算法】对OCR结果或PDF提取的文本进行版面分析，按人类阅读顺序进行排序。
☆167Feb 28, 2024Updated 2 years ago
Ryaang / EventRAG
View on GitHub
☆21Feb 16, 2025Updated last year
ibaiGorordo / Tapir-Pytorch-Inference
View on GitHub
Minimal code for Tapir model inference in Pytorch
☆18Aug 13, 2024Updated last year
360AILABNLP / 360LayoutAnalysis
View on GitHub
☆28Oct 14, 2024Updated last year
LingyvKong / OneChart
View on GitHub
[ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"
☆266Apr 14, 2025Updated last year
RapidAI / RapidLayout
View on GitHub
Analysis of Chinese and English layouts 中英文版面分析
☆275Mar 24, 2026Updated 3 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
RapidAI / RapidTableDetection
View on GitHub
检测和提取各种场景图片中的表格区域，并纠正透视和旋转问题 Detect and extract table regions from images in various scenarios, and correct perspective and rotation i…
☆119Dec 10, 2024Updated last year
IBM / SynthTabNet
View on GitHub
Dataset of PNG images from synthetically generated table layouts with annotations in JSONL files
☆154Sep 17, 2025Updated 10 months ago
WenmuZhou / TableGeneration
View on GitHub
通过浏览器渲染生成表格图像
☆238Apr 10, 2024Updated 2 years ago
leancloud / delphi-sdk
View on GitHub
[NO MAINTAINED] Delphi SDK for LeanCloud BaaS demo
☆12Dec 6, 2019Updated 6 years ago
racinmat / GTAVisionExport-postprocessing
View on GitHub
☆11Jan 27, 2020Updated 6 years ago
ducanh841988 / awesome-math-recognition
View on GitHub
This repository summaries publications on Recognition of Handwritten Mathematical Expressions
☆15Oct 27, 2017Updated 8 years ago
ntnu-ai-lab / eSNN
View on GitHub
eSNN - Learning similarity measure from data
☆12Nov 28, 2019Updated 6 years ago
liuyifan6613 / DocBank-Document-Enhancement-Dataset
View on GitHub
DocBank 文档图像增强数据集，此数据集用于文档图像增强，具体任务包括以下内容：Seal detection & Removal 印章检测 & 移除；Watermark detection & Removal 水印检测 & 移除；Document deblurrin…
☆48Oct 22, 2024Updated last year
LukeForeverYoung / UReader
View on GitHub
☆142Feb 13, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
FreeOCR-AI / yolo-doclaynet
View on GitHub
YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis
☆158Mar 10, 2026Updated 4 months ago
HCIILAB / EPHOIE
View on GitHub
☆110Feb 16, 2021Updated 5 years ago
doc-analysis / XFUND
View on GitHub
XFUND: A Multilingual Form Understanding Benchmark
☆223Jul 15, 2022Updated 4 years ago
volkancirik / refer360
View on GitHub
Repository for ACL2020 paper "Refer360° A Referring Expression Recognition Dataset in 360°Images"
☆14Jun 26, 2021Updated 5 years ago
surelle-ha / StoryAI-Visualizer-Server
View on GitHub
Artificial intelligent image/scenery and narration generation based on the story content. Developed using ExpressJS. Render using VueJS. …
☆10Jun 2, 2024Updated 2 years ago
uees / tdxStock
View on GitHub
采集和分析沪深股票财务数据
☆22Mar 3, 2023Updated 3 years ago
hamdiboukamcha / Yolo-V12-cpp-TensorRT
View on GitHub
The YOLOv12 C++ TensorRT Project in C++ and optimized using NVIDIA TensorRT
☆26Oct 22, 2025Updated 9 months ago
wkentaro / yolo-world-onnx
View on GitHub
ONNX models of YOLO-World (an open-vocabulary object detection).
☆28Jun 29, 2024Updated 2 years ago
valpackett / mail2elasticsearch
View on GitHub
Fast ElasticSearch indexer for MIME email
☆13Feb 3, 2018Updated 8 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
caraxl / bge-reranker-base-api
View on GitHub
☆17Jan 5, 2024Updated 2 years ago
RapidAI / TableStructureRec
View on GitHub
整理目前开源的最优表格识别模型，完善前后处理，模型转换为ONNX | Organize the currently open-source optimal table recognition models, improve pre-processing and post-…
☆954Aug 3, 2025Updated 11 months ago
PassByYou888 / PascalString
View on GitHub
string port all platforms
☆15May 27, 2020Updated 6 years ago
aFlyBird0 / cubox-archiver
View on GitHub
把 cubox 稍后读软件的「归档」内容转存到其他地方（如Notion），以突破其只能存200条数据的限制
☆11Dec 31, 2024Updated last year
yilunzhao / RobuT
View on GitHub
Data and code for ACL 2023 paper "RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations"
☆15Feb 8, 2024Updated 2 years ago
ucaslcl / Fox
View on GitHub
official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"
☆196May 31, 2024Updated 2 years ago
youngjoey-ai / tracerag
View on GitHub
一个强调工程化、可观测、可测试、可扩展的 RAG 项目。TraceRAG 的目标不是只把答案“生成出来”，而是把文档导入、切块、向量化、检索、带来源回答、评估与后续 tracing 拆成可独立验证的阶段，逐步演进成一个可维护、可解释、可复盘的生产级 RAG。
☆15Apr 2, 2026Updated 3 months ago