A basic tool that extracts the structure from the PDF files of scientific articles.
☆77Jan 4, 2022Updated 4 years ago
Alternatives and similar repositories for pdfact
Users that are interested in pdfact are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The repository of Icecite, a research paper management system.☆15Mar 29, 2018Updated 8 years ago
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆73Nov 7, 2020Updated 5 years ago
- Named Entity Disambiguation and Linking☆16May 24, 2024Updated 2 years ago
- table understanding dataset for comparative evaluation of different table understanding algorithms☆13Jun 15, 2018Updated 7 years ago
- PDF Extraction Toolkit☆43Nov 23, 2020Updated 5 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Structured Data from PDF image-based files☆91Mar 1, 2013Updated 13 years ago
- SIGIR'20: An Analysis of BERT in Document Ranking☆21Jul 27, 2020Updated 5 years ago
- Keyphrase Extraction Prototypes☆15Nov 24, 2016Updated 9 years ago
- Systematic Review Query Visualisation and Understanding Interface☆17Dec 5, 2025Updated 6 months ago
- LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance …☆82Mar 2, 2018Updated 8 years ago
- Tokenize and clean strings in Python☆11Jan 11, 2018Updated 8 years ago
- A toolkit for asynchronously validating dense retriever checkpoints during training.☆27Aug 10, 2023Updated 2 years ago
- A Python interface to PISA☆37Updated this week
- XSLT application to generate MARCXML from BIBFRAME RDF/XML☆19Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Ollie is a open information extractor that uses dependency parses.☆12Sep 27, 2013Updated 12 years ago
- Spell checker using Brill and Moore's noisy channel error model☆13Jan 9, 2019Updated 7 years ago
- Blacklight IIIF Content Search plugin☆14Mar 17, 2026Updated 2 months ago
- Multi-Entity Extraction Framework for Academic Documents (with default extraction tools)☆31Oct 3, 2023Updated 2 years ago
- A tool for correcting misspellings in textual input using the Noisy Channel Model.☆11Sep 26, 2020Updated 5 years ago
- ☆16Apr 30, 2026Updated last month
- Named Entity Recognition with the Nametag Maximum Entropy Markov model☆12Feb 9, 2026Updated 4 months ago
- Science Parse parses scientific papers (in PDF form) and returns them in structured form.☆700May 26, 2024Updated 2 years ago
- Convert ALTO XML to plain text + minimal metadata☆17Oct 17, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆20Jul 22, 2021Updated 4 years ago
- A SPARQL language server☆42Updated this week
- Recommendation engine for scholarly articles☆12Oct 22, 2019Updated 6 years ago
- Code and Data for paper: Estimating Attention Flow in Online Video Networks (CSCW '19)☆12Nov 19, 2019Updated 6 years ago
- Jurisdiction ID and abbreviation data files for using with Jurism and other projects.☆44Nov 8, 2023Updated 2 years ago
- Mirror of the official development repository of PHAIDRA. We monitor our public github repo, so contributions via issues & pull requests…☆22Updated this week
- Analyze XML extracted from PDFs (e.g. from TET or PDFMiner)☆20Jan 11, 2018Updated 8 years ago
- liberate all kinds of data from PDF and other unstructural format and make the information machine-readable and visualizeable for popul…☆32Jun 1, 2018Updated 8 years ago
- Basic RDF Datatypes☆15Feb 23, 2026Updated 3 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- TeXoo – A Zoo of Text Extractors☆18Jun 2, 2020Updated 6 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆180Mar 18, 2023Updated 3 years ago
- A tool for analyzing and visualizing discrete temporal events☆17Aug 15, 2018Updated 7 years ago
- Keyphrase Generation for Scientific Document Retrieval☆11Oct 2, 2020Updated 5 years ago
- ☆32Aug 20, 2021Updated 4 years ago
- INCLUSIFY is a tool to support the practical use of diversity-sensitive language in German.☆12Sep 14, 2022Updated 3 years ago
- ELOT Literate Ontology Tool☆29Updated this week