hint-lab / doctrackLinks
Dataset for EMNLP'23 Paper "DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading"
β10Updated last year
Alternatives and similar repositories for doctrack
Users that are interested in doctrack are comparing it to the libraries listed below
Sorting:
- an unofficial code for augment-XY-CUT in XYLayoutLMβ28Updated 3 years ago
- π³CED: Catalog Extraction from Documentsβ16Updated last year
- Dataset and scripts for HRDocβ39Updated 2 years ago
- Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.β184Updated 2 years ago
- β87Updated 3 years ago
- XFUND: A Multilingual Form Understanding Benchmarkβ207Updated 3 years ago
- [ACL 2024] IEPile: A Large-Scale Information Extraction Corpusβ197Updated 6 months ago
- A set of Python scripts for preprocessing the Wikidata JSON dump and running simple queries in an efficient manner.β125Updated 9 months ago
- Code repo for ACL22 paper "DeepStruct: Pretraining of Language Models for Structure Prediction"β85Updated 2 years ago
- [IJCAI 2021] Document-level Relation Extraction as Semantic Segmentationβ146Updated 2 years ago
- Implementation of paper: HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Rerankingβ72Updated 2 years ago
- An unofficial Pytorch implementation of ERNIE-Layout which is originally released through PaddleNLP.β106Updated last year
- Example codebase for fine-tuning layoutLMv3 on DocVQAβ52Updated 2 years ago
- Two approaches for robust TableQA: 1) ITR is a general-purpose retrieval-based approach for handling long tables in TableQA transformer mβ¦β39Updated last year
- The WordScape repository contains code for the WordScape pipeline to create datasets to train document understanding models.β37Updated last year
- β140Updated 2 months ago
- This is the official repository of the revised datasets FUNSD-r and CORD-r, introduced in EMNLP 2023 paper Reading Order Matters: Informaβ¦β17Updated last year
- β252Updated 2 years ago
- TeX compilation service that makes use of arXiv.org's AutoTeX library.β34Updated last month
- Towards Real-World Writing Assistance: A Chinese Character Checking Benchmark with Faked and Misspelled Charactersβ17Updated last year
- A large-scale complex question answering evaluation of ChatGPT and similar large-language modelsβ40Updated last year
- TUTA and ForTaP for Structure-Aware and Numerical-Reasoning-Aware Table Pre-Trainingβ116Updated 8 months ago
- [EMNLP2024] Aligning Large Language Models on Information Extractionβ51Updated 8 months ago
- TAT-QA (Tabular And Textual dataset for Question Answering) contains 16,552 questions associated with 2,757 hybrid contexts from real-worβ¦β112Updated 7 months ago
- β213Updated 2 years ago
- T2Ranking: A large-scale Chinese benchmark for passage ranking.β159Updated 2 years ago
- This is the code for our KILT leaderboard submissions (KGI + Re2G models).β156Updated 2 months ago
- Evaluation of Natural Language Processing (NLP) tools for the Ancient Chinese languageβ37Updated 5 months ago
- Datasets and Evaluation Scripts for CompHRDocβ46Updated 4 months ago
- β8Updated 7 months ago