deepopinion / anls_star_metric
Official implementation of the ANLS* metric
☆19Updated 2 weeks ago
Alternatives and similar repositories for anls_star_metric:
Users that are interested in anls_star_metric are comparing it to the libraries listed below
- The code related to the baselines from NeurIPS 2021 paper "DUE: End-to-End Document Understanding Benchmark."☆36Updated 2 years ago
- The official repository for Efficient Long-Text Understanding Using Short-Text Models (Ivgi et al., 2022) paper☆70Updated last year
- We identify the desiderata for a comprehensive benchmark and propose Visually Rich Document Understanding (VRDU). VRDU contains two datas…☆80Updated 2 years ago
- 🔍 A statutory article retrieval dataset in French. (ACL 2022)☆39Updated last year
- Pretraining Efficiently on S2ORC!☆163Updated 6 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆128Updated last year
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆176Updated 2 years ago
- The WordScape repository contains code for the WordScape pipeline to create datasets to train document understanding models.☆35Updated last year
- ☆65Updated last year
- ☆32Updated last year
- ☆57Updated 3 years ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆100Updated last year
- multimodal document analysis☆164Updated 11 months ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback☆95Updated last year
- Code, datasets, and checkpoints for the paper "Improving Passage Retrieval with Zero-Shot Question Generation (EMNLP 2022)"☆101Updated 2 years ago
- A multilingual version of MS MARCO passage ranking dataset☆145Updated last year
- ☆54Updated 2 years ago
- Multilingual Large Language Models Evaluation Benchmark☆123Updated 8 months ago
- ☆89Updated 4 months ago
- GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings☆42Updated last year
- Long-context pretrained encoder-decoder models☆94Updated 2 years ago
- A framework for few-shot evaluation of autoregressive language models.☆103Updated last year
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆180Updated 4 months ago
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023☆68Updated last year
- Official Implementation of TFLOP: Table Structure Recognition Framework with Layout Pointer Mechanism☆26Updated last week
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆75Updated 3 years ago
- minimal pytorch implementation of bm25 (with sparse tensors)☆101Updated last year
- ☆29Updated last year
- OCR Annotations from Amazon Textract for Industry Documents Library☆103Updated 2 years ago