aphp / edspdfLinks
EDS-PDF is a generic, pure-Python framework for text extraction from PDF documents. It provides the machinery to use rule- or machine-learning-based approaches to classify text blocs between body and meta-data.
☆57Updated 8 months ago
Alternatives and similar repositories for edspdf
Users that are interested in edspdf are comparing it to the libraries listed below
Sorting:
- Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.☆141Updated this week
- Python package for deduplication/entity resolution using active learning☆81Updated last year
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- ☆55Updated last year
- Next-generation Punkt sentence boundary detection with zero dependencies☆20Updated 2 months ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59Updated last year
- Tools for interactive visual exploration of semantic embeddings.☆38Updated last year
- A basic tool that extracts the structure from the PDF files of scientific articles.☆76Updated 3 years ago
- A TextBlob sentiment analysis pipeline component for spaCy.☆56Updated last year
- PDF parser powered by grobid☆28Updated last year
- 🦦 weasel: A small and easy workflow system☆87Updated last year
- An open-source package for python to clean raw text data☆72Updated 2 years ago
- Aim-spaCy integration☆35Updated 2 years ago
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.☆21Updated last year
- An easy way to chunk spaCy docs.☆22Updated last year
- multimodal document analysis☆166Updated last year
- A Python library to de-identify medical records with state-of-the-art NLP methods.☆140Updated last year
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆12Updated last year
- Scientific Document Insight Q/A☆31Updated 2 months ago
- 🖍️ Highlight text in documents☆109Updated 6 months ago
- A simple library for segmenting legal texts☆17Updated 2 years ago
- A python package to simulate typographical errors.☆38Updated last year
- Plug-and-play, zero-shot document processing pipelines.☆109Updated last week
- A spaCy wrapper for GliNER☆123Updated 9 months ago
- ☆28Updated last year
- Robust and fast topic models with sentence-transformers.☆80Updated last week
- XAI based human-in-the-loop framework for automatic rule-learning.☆49Updated last year
- Evaluation framework for document processing models and services.☆52Updated this week
- 🔢 Work with static vector models☆31Updated 6 months ago
- A deep learning model for extracting references from text☆29Updated 2 years ago