aphp / edspdfLinks
EDS-PDF is a generic, pure-Python framework for text extraction from PDF documents. It provides the machinery to use rule- or machine-learning-based approaches to classify text blocs between body and meta-data.
☆51Updated 5 months ago
Alternatives and similar repositories for edspdf
Users that are interested in edspdf are comparing it to the libraries listed below
Sorting:
- Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.☆125Updated last week
- ☆55Updated last year
- A basic tool that extracts the structure from the PDF files of scientific articles.☆74Updated 3 years ago
- Tools for interactive visual exploration of semantic embeddings.☆35Updated 11 months ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59Updated last year
- Python package for deduplication/entity resolution using active learning☆81Updated 11 months ago
- PDF parser powered by grobid☆28Updated last year
- Open Access PDF harvester, metadata aggregator and full-text ingester☆62Updated last year
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.☆18Updated 11 months ago
- A spaCy wrapper for GliNER☆118Updated 6 months ago
- Viewer for the structure extracted by Grobid on PDF documents☆52Updated 3 months ago
- Confection: the sweetest config system for Python☆188Updated 3 months ago
- 🔢 Work with static vector models☆28Updated 3 months ago
- A Flexible Deep Learning Approach to Fuzzy String Matching☆146Updated 9 months ago
- Robust and fast topic models with sentence-transformers.☆76Updated last month
- XAI based human-in-the-loop framework for automatic rule-learning.☆49Updated last year
- An open-source package for python to clean raw text data☆70Updated 2 years ago
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆52Updated 4 months ago
- Next-generation Punkt sentence boundary detection with zero dependencies☆17Updated 4 months ago
- One downloader for many scientific data and code repositories! DOI Data☆75Updated last week
- Scientific Document Insight Q/A☆29Updated last month
- spaCy extension for Visual Studio Code☆32Updated 4 months ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆164Updated 2 years ago
- A python library for the Semantic Scholar (S2) API with typed pydantic objects and various nifty functionalities.☆22Updated 4 years ago
- A PyPI package for easy text annotation in a Jupyter Notebook.☆28Updated 4 years ago
- 🖍️ Highlight text in documents☆109Updated 3 months ago
- Named entity recognition for the legal domain☆42Updated 4 years ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆80Updated last year
- Preprocessing and analysis for training SNOMED-CT concept embeddings from CORD-19 corpus☆15Updated 2 years ago