allenai/mmda

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/allenai/mmda)

allenai / mmda

multimodal document analysis

☆166

Alternatives and similar repositories for mmda

Users that are interested in mmda are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

allenai / vila
View on GitHub
Incorporating VIsual LAyout Structures for Scientific Text Classification
☆180Mar 18, 2023Updated 3 years ago
allenai / smashed
View on GitHub
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…
☆35May 24, 2024Updated 2 years ago
allenai / pawls
View on GitHub
Software that makes labeling PDFs easy.
☆434May 13, 2024Updated 2 years ago
zoranmedic / mdcr
View on GitHub
Benchmark dataset for the evaluation of scientific article representations on the task of citation recommendation across various scientif…
☆12Oct 21, 2022Updated 3 years ago
uakarsh / TiLT-Implementation
View on GitHub
Implementation of the paper: Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer.
☆18Apr 23, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
jpWang / LiLT
View on GitHub
Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understan…
☆366Oct 31, 2022Updated 3 years ago
prohandler / GS-Bulk-Emails
View on GitHub
Google App Scripts that sends a number of emails from the specific number and that tracks the open status of each email
☆17Dec 11, 2024Updated last year
allenai / s2_fos
View on GitHub
☆34Jan 2, 2024Updated 2 years ago
applicaai / CCpdf
View on GitHub
Index of URLs to pdf files all over the internet and scripts
☆25May 2, 2023Updated 3 years ago
allenai / papermage
View on GitHub
library supporting NLP and CV research on scientific papers
☆800Nov 8, 2024Updated last year
doc-analysis / DocBank
View on GitHub
DocBank: A Benchmark Dataset for Document Layout Analysis
☆653Aug 12, 2024Updated last year
gipplab / pdf-benchmark
View on GitHub
A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-Domain Evaluation Framework for Academic Documents
☆32Dec 8, 2022Updated 3 years ago
allenai / mup
View on GitHub
☆18Oct 22, 2022Updated 3 years ago
allenai / s2orc-doc2json
View on GitHub
Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)
☆473Apr 11, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Layout-Parser / layout-parser
View on GitHub
A Unified Toolkit for Deep Learning Based Document Image Analysis
☆5,765Aug 15, 2024Updated last year
YingtongDou / CrossFake
View on GitHub
A cross-lingual COVID-19 fake news dataset
☆14Oct 14, 2021Updated 4 years ago
ibm-aur-nlp / PubLayNet
View on GitHub
☆1,053Jul 9, 2025Updated last year
herobd / FUDGE
View on GitHub
Code for the ICDAR2021 paper "Visual FUDGE: Form Understanding via Dynamic Graph Editing"
☆33Mar 4, 2022Updated 4 years ago
applicaai / kleister-nda
View on GitHub
☆61Aug 18, 2021Updated 4 years ago
ibm-aur-nlp / PubTabNet
View on GitHub
☆484Jul 8, 2025Updated last year
mscarey / legislice
View on GitHub
API client for fetching and comparing passages from legislation
☆14Jun 29, 2026Updated 3 weeks ago
fdalvi / analyzing-redundancy-in-pretrained-transformer-models
View on GitHub
Code for Analyzing Redundancy in Pretrained Transformer Models accepted at EMNLP 2020
☆14Oct 6, 2020Updated 5 years ago
momo-journey / CDial-GPT-NEZHA
View on GitHub
pytorch版基于gpt+nezha的中文多轮Cdial
☆11Oct 22, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
phucty / wtabhtml
View on GitHub
Tool to parse wiki tables from the HTML dump of Wikipedia
☆11Jun 12, 2022Updated 4 years ago
allenai / s2orc
View on GitHub
S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
☆1,075Apr 26, 2024Updated 2 years ago
allenai / open-mds
View on GitHub
The corresponding code for our paper: "Exploring the Challenges of Open Domain Multi-Document Summarization". Do not hesitate to open an …
☆33Jun 24, 2023Updated 3 years ago
tstanislawek / awesome-document-understanding
View on GitHub
A curated list of resources for Document Understanding (DU) topic
☆1,525Jun 2, 2023Updated 3 years ago
mscarey / justopinion
View on GitHub
Download client for legal opinions
☆13Jun 12, 2026Updated last month
allenai / SciREX
View on GitHub
Data/Code Repository for https://api.semanticscholar.org/CorpusID:218470122
☆140Jul 25, 2024Updated 2 years ago
stefan-it / ukrainian-electra
View on GitHub
Ukrainian ELECTRA model
☆12Mar 11, 2023Updated 3 years ago
SCUT-DLVCLab / Document-AI-Recommendations
View on GitHub
Algorithms, papers, datasets, performance comparisons for Document AI.
☆209Mar 1, 2025Updated last year
UBIAI / layout_lm_tutorial
View on GitHub
☆15Jun 16, 2021Updated 5 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
kermitt2 / entity-fishing
View on GitHub
A machine learning tool for fishing entities
☆268Feb 27, 2026Updated 4 months ago
allenai / specter
View on GitHub
SPECTER: Document-level Representation Learning using Citation-informed Transformers
☆586Jun 12, 2023Updated 3 years ago
ZhaofengWu / SIFT
View on GitHub
☆14Aug 3, 2022Updated 3 years ago
allenai / unifew
View on GitHub
Unifew: Unified Fewshot Learning Model
☆18Sep 10, 2021Updated 4 years ago
argilla-io / adept-augmentations
View on GitHub
A Python library aimed at dissecting and augmenting NER training data.
☆60May 11, 2023Updated 3 years ago
jiangshdd / ReviewCritique
View on GitHub
☆13Sep 26, 2024Updated last year
neelguha / legal-segmenter
View on GitHub
A simple library for segmenting legal texts
☆18Apr 22, 2023Updated 3 years ago