chongzhangFDU / Token-Path-Prediction-Datasets
This is the official repository of the revised datasets FUNSD-r and CORD-r, introduced in EMNLP 2023 paper Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction.
☆14Updated 6 months ago
Related projects: ⓘ
- 🌳CED: Catalog Extraction from Documents☆15Updated last year
- This is the official repository of the EMNLP 2023 paper Reading Order Matters: Information Extraction from Visually-rich Documents by Tok…☆17Updated 6 months ago
- ☆74Updated 2 years ago
- 通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser☆41Updated 3 months ago
- XFUND: A Multilingual Form Understanding Benchmark☆182Updated 2 years ago
- Implementation of paper: HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking☆65Updated last year
- T2Ranking: A large-scale Chinese benchmark for passage ranking.☆144Updated last year
- A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating…☆111Updated last week
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆76Updated 3 months ago
- An unofficial Pytorch implementation of ERNIE-Layout which is originally released through PaddleNLP.☆97Updated 10 months ago
- LLM for NER☆47Updated last month
- ☆25Updated this week
- ☆19Updated last year
- ☆50Updated 6 months ago
- an unofficial code for augment-XY-CUT in XYLayoutLM☆25Updated 2 years ago
- TUTA and ForTaP for Structure-Aware and Numerical-Reasoning-Aware Table Pre-Training☆96Updated last year
- text embedding☆133Updated last year
- NTK scaled version of ALiBi position encoding in Transformer.☆64Updated last year
- Code & Data for our Paper "NaSGEC: Multi-Domain Chinese Grammatical Error Correction for Native Speaker Texts" (ACL 2023 Findings)☆73Updated last year
- chinese document classification of layoutlmv3 and layoutxlm☆38Updated last year
- ☆56Updated last month
- code for piccolo embedding model from SenseTime☆93Updated 4 months ago
- ☆103Updated 7 months ago
- A curated list of papers about key information extraction.☆72Updated last month
- 百川Dynamic NTK-ALiBi的代码实现:无需微调即可推理更长文本☆45Updated last year
- ☆57Updated last year
- The Corpus & Code for EMNLP 2022 paper "FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction" | FCGEC中文语法纠错语料及STG模型☆104Updated last month
- code and data for "CSCD-NS: a Chinese Spelling Check Dataset for Native Speakers"☆51Updated last month
- ☆111Updated 6 months ago
- The WordScape repository contains code for the WordScape pipeline to create datasets to train document understanding models.☆32Updated 9 months ago