chongzhangFDU / Token-Path-Prediction-Datasets
This is the official repository of the revised datasets FUNSD-r and CORD-r, introduced in EMNLP 2023 paper Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction.
☆16Updated last year
Alternatives and similar repositories for Token-Path-Prediction-Datasets:
Users that are interested in Token-Path-Prediction-Datasets are comparing it to the libraries listed below
- This is the official repository of the EMNLP 2023 paper Reading Order Matters: Information Extraction from Visually-rich Documents by Tok…☆18Updated last year
- 🌳CED: Catalog Extraction from Documents☆16Updated last year
- ☆82Updated 2 years ago
- ☆125Updated 2 weeks ago
- A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating…☆178Updated 7 months ago
- Dataset and scripts for HRDoc☆36Updated last year
- A curated list of papers about key information extraction.☆93Updated 4 months ago
- ☆58Updated 10 months ago
- XFUND: A Multilingual Form Understanding Benchmark☆200Updated 2 years ago
- 1st Solution For Conversational Multi-Doc QA Workshop & International Challenge @ WSDM'24 - Xiaohongshu.Inc☆161Updated last year
- CDLA: A Chinese document layout analysis (CDLA) dataset☆263Updated 3 years ago
- chinese document classification of layoutlmv3 and layoutxlm☆43Updated 2 years ago
- 通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser☆46Updated 10 months ago
- T2Ranking: A large-scale Chinese benchmark for passage ranking.☆157Updated last year
- ICDAR 2024 Table OCR Model☆33Updated 5 months ago
- an unofficial code for augment-XY-CUT in XYLayoutLM☆27Updated 2 years ago
- A large scale camera-taken table detection and recognition dataset.☆128Updated last year
- 该项目是为了使用layoutlmv3针对中文图片训练和推理。 其中主要解决三个问题: 1.数据标准化成可以的训练数据集格式 2.layoutlmv3-base-chinese 分词修改 2.超过512长度的文本切分和滑窗操作☆44Updated 8 months ago
- ☆12Updated 8 months ago
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆280Updated 7 months ago
- Table Structure Recognition☆72Updated 2 years ago
- sentence-transformers to onnx 让sbert模型推理效率更快☆163Updated 3 years ago
- Implementation of research paper "Deep Splitting and Merging for Table Structure Decomposition"☆61Updated 2 years ago
- ☆87Updated 4 months ago
- Code & Data for our Paper "NaSGEC: Multi-Domain Chinese Grammatical Error Correction for Native Speaker Texts" (ACL 2023 Findings)☆86Updated 2 months ago
- MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.☆22Updated 4 months ago
- text embedding☆145Updated last year
- MTL-TabNet: Multi-task Learning based Model for Image-based Table Recognition☆100Updated 11 months ago
- An unofficial Pytorch implementation of ERNIE-Layout which is originally released through PaddleNLP.☆105Updated last year
- 视觉信息抽取任务中,使用OCR识别结果规范多模态大模型的回答☆30Updated 4 months ago