Extracting Semi-Structured Data from PDFs on a large scale
☆52Jul 7, 2022Updated 3 years ago
Alternatives and similar repositories for pdfreader
Users that are interested in pdfreader are comparing it to the libraries listed below
Sorting:
- ☆12Mar 24, 2021Updated 4 years ago
- `pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.☆105Apr 1, 2024Updated last year
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆23Sep 11, 2020Updated 5 years ago
- Software for building the IR Anthology.☆11Sep 19, 2023Updated 2 years ago
- A Python Interface to Reproducibility Measures of System-Oriented IR Experiments☆11Dec 2, 2025Updated 3 months ago
- A step-by-step C# implementation of the Docstrum algorithm☆24Dec 13, 2020Updated 5 years ago
- ☆11Aug 20, 2023Updated 2 years ago
- Vuejs 3 - Quasar Framework UI Design E-commerce Admin☆10Sep 11, 2021Updated 4 years ago
- ☆12Oct 1, 2020Updated 5 years ago
- the notebook component of a PySpark application to calculate value-at-risk for a portfolio of securities☆11Jan 14, 2017Updated 9 years ago
- ☆15Dec 15, 2025Updated 3 months ago
- ☆10Nov 22, 2022Updated 3 years ago
- ☆14Feb 20, 2025Updated last year
- Provenance Management for Data Science Notebooks☆14Dec 2, 2021Updated 4 years ago
- Tools for Natural Language Text aware PDF structure analysis☆15Mar 11, 2022Updated 4 years ago
- pivottablejs for air-gapped systems☆13Aug 14, 2024Updated last year
- A starter template for creating web applications with Google Apps Script & Svelte☆10Oct 20, 2023Updated 2 years ago
- Conversational Recommender System Evaluation via Simulation☆19Updated this week
- CEU python for finance course material☆22Feb 25, 2020Updated 6 years ago
- A Python utility for indexing file lines. Best demo honourable mention at ECIR 2024.☆23Nov 9, 2025Updated 4 months ago
- Functional and structural analysis of tables in research papers (Table disentangling)☆20Aug 7, 2017Updated 8 years ago
- Codes, datasets, and explanations for some basic natural language tasks and models.☆11Dec 9, 2020Updated 5 years ago
- ☆11Jun 18, 2024Updated last year
- This is a Shiny app to fetch users' activity and interact with Rmarkdown (pdf/word) report☆17Apr 22, 2019Updated 6 years ago
- The source code for the TIRA Shared Task Platform☆17Updated this week
- RedRock - Mobile Application prototype using Apache Spark, Twitter and Elasticsearch☆14Sep 10, 2018Updated 7 years ago
- ipywidgets GUI elements for HyperSpy☆11Feb 1, 2026Updated last month
- ERPL is a DuckDB extension to connect to API based ecosystems via standard interfaces like OData, GraphQL and REST. This works e.g. for S…☆25Mar 12, 2026Updated last week
- Deep neural network to extract intelligent information from invoice documents using PyTorch.☆16Aug 31, 2022Updated 3 years ago
- init☆11Sep 30, 2017Updated 8 years ago
- 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions☆20Mar 1, 2023Updated 3 years ago
- Use Processing to request the GPS location of Dutch trains via API and save the results to a local file.☆21Jan 17, 2022Updated 4 years ago
- Audio Preprocessing and finetuning of wav2vec2-large-xlsr model on AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF Data.☆18Nov 13, 2021Updated 4 years ago
- fuzzy matchers for chai, based on underscore☆24Mar 17, 2016Updated 10 years ago
- ☆27Jan 14, 2025Updated last year
- OptimSeed - Seed Word Selection for Weakly-Supervised Text Classification [NAACL SRW 2021]☆14Mar 29, 2021Updated 4 years ago
- A framework-agnostic client-side JavaScript library for logging user interactions on webpages.☆19Feb 3, 2022Updated 4 years ago
- A Domain-Specific Language (DSL) for designing experiments in psychology☆15Feb 21, 2022Updated 4 years ago
- Tutorial Apps for Learning R☆18Dec 28, 2017Updated 8 years ago