allenai / mmda
multimodal document analysis
☆162Updated 8 months ago
Alternatives and similar repositories for mmda:
Users that are interested in mmda are comparing it to the libraries listed below
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆175Updated last year
- Pretraining Efficiently on S2ORC!☆156Updated 3 months ago
- ☆57Updated 3 years ago
- Code accompanying the submission "Structural Text Segmentation of Legal Documents" by Aumiller et al.☆96Updated last year
- ☆84Updated 9 months ago
- A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network☆285Updated 4 months ago
- Mining Legal Arguments in Court Decisions - Data and software☆66Updated last year
- ☆155Updated 8 months ago
- Inquisitive Parrots for Search☆186Updated 11 months ago
- Repository for Zheng and Guha et al., 2021, "When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Data…☆87Updated last year
- A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.☆105Updated 10 months ago
- SciRepEval benchmark training and evaluation scripts☆72Updated 9 months ago
- Logical structure analysis for visually structured documents☆86Updated 2 years ago
- Software that makes labeling PDFs easy.☆405Updated 9 months ago
- ☆77Updated 2 years ago
- Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)☆356Updated 10 months ago
- This repository provides scripts for evaluating NLP models on the LEXTREME benchmark, a set of diverse multilingual tasks in legal NLP☆21Updated last year
- A dataset for pretraining language models targeted for legal tasks.☆126Updated 2 years ago
- Streamlit Named Entity Recognition (NER) annotation custom component☆39Updated 2 years ago
- 💫 SpaCy wrapper for ConceptNet 💫☆89Updated last year
- ☆242Updated 2 years ago
- Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: …☆328Updated last year
- We identify the desiderata for a comprehensive benchmark and propose Visually Rich Document Understanding (VRDU). VRDU contains two datas…☆76Updated 2 years ago
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆104Updated 9 months ago
- A Python library aimed at dissecting and augmenting NER training data.☆58Updated last year
- A multilingual version of MS MARCO passage ranking dataset☆143Updated last year
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆102Updated 5 months ago
- The pipeline for the OSCAR corpus☆166Updated last year
- ☆31Updated 10 months ago
- This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' pu…☆40Updated 3 years ago