allenai / mmda
multimodal document analysis
☆160Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for mmda
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆173Updated last year
- ☆82Updated 6 months ago
- ☆147Updated 5 months ago
- Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further lang…☆117Updated 6 months ago
- Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)☆348Updated 7 months ago
- Pretraining Efficiently on S2ORC!☆136Updated 3 weeks ago
- ☆74Updated 2 years ago
- A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.☆103Updated 7 months ago
- Software that makes labeling PDFs easy.☆391Updated 6 months ago
- ☆55Updated 3 years ago
- Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: …☆323Updated last year
- A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network☆259Updated last month
- This repository provides scripts for evaluating NLP models on the LEXTREME benchmark, a set of diverse multilingual tasks in legal NLP☆20Updated 10 months ago
- SpanMarker for Named Entity Recognition☆401Updated 3 months ago
- Code accompanying the submission "Structural Text Segmentation of Legal Documents" by Aumiller et al.☆96Updated last year
- Repository for Zheng and Guha et al., 2021, "When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Data…☆84Updated last year
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆103Updated 6 months ago
- Mining Legal Arguments in Court Decisions - Data and software☆64Updated last year
- The code related to the baselines from NeurIPS 2021 paper "DUE: End-to-End Document Understanding Benchmark."☆35Updated last year
- 💫 SpaCy wrapper for ConceptNet 💫☆88Updated last year
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2…☆66Updated last year
- We identify the desiderata for a comprehensive benchmark and propose Visually Rich Document Understanding (VRDU). VRDU contains two datas…☆74Updated last year
- This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' pu…☆40Updated 2 years ago
- The official repository for Efficient Long-Text Understanding Using Short-Text Models (Ivgi et al., 2022) paper☆67Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 8 months ago
- A Python Search Engine for Humans 🥸☆185Updated 6 months ago
- Publicly released code for the LAMBERT model☆102Updated 3 years ago
- Data and additional information regarding the paper: Contract Discovery. Dataset and a Few-Shot Semantic Retrieval Challenge with Competi…☆29Updated 4 years ago
- Data and models for the SciFact verification task.☆225Updated last year
- Zero and Few shot named entity & relationships recognition☆349Updated 2 months ago