aphp / edspdf
EDS-PDF is a generic, pure-Python framework for text extraction from PDF documents. It provides the machinery to use rule- or machine-learning-based approaches to classify text blocs between body and meta-data.
☆44Updated last week
Alternatives and similar repositories for edspdf:
Users that are interested in edspdf are comparing it to the libraries listed below
- Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.☆120Updated this week
- EDS-Pseudo is a hybrid model for detecting personally identifying entities in clinical reports☆52Updated 2 weeks ago
- A YAML parser with advanced functionalities to ease your application configuration☆33Updated 2 weeks ago
- eds-scikit is a Python library providing tools to process and analyse OMOP data☆38Updated 2 months ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated 11 months ago
- Tools for interactive visual exploration of semantic embeddings.☆30Updated 5 months ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59Updated 9 months ago
- A spaCy custom component that extracts and normalizes temporal expressions☆54Updated 2 years ago
- Python package for deduplication/entity resolution using active learning☆76Updated 5 months ago
- 🧪 Cutting-edge experimental spaCy components and features☆96Updated 9 months ago
- Question Answering annotation platform - Plateforme d'annotation☆90Updated 3 weeks ago
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Updated 10 months ago
- Annotator building tool for Jupyter☆21Updated 11 months ago
- The CleanCoNLL dataset from our EMNLP 2023 paper where we corrected annotation errors and inconsistencies in CoNLL-03.☆23Updated 7 months ago
- 🤗 Push your spaCy pipelines to the Hugging Face Hub☆44Updated 8 months ago
- Aim-spaCy integration☆34Updated last year
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.☆14Updated 6 months ago
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporated…☆26Updated 2 years ago
- Blue Brain text mining toolbox for semantic search and structured information extraction☆44Updated last year
- XAI based human-in-the-loop framework for automatic rule-learning.☆48Updated 7 months ago
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆12Updated 6 months ago
- This repository is now archived. Further development has been moved to https://github.com/medkit-lib/medkit.☆24Updated last year
- Code for SaGe subword tokenizer (EACL 2023)☆22Updated 2 months ago
- Small python package to measure OCR quality and other related metrics.☆21Updated last year
- Fact checking baseline combining dense retrieval and textual entailment☆28Updated last month
- A Python library aimed at dissecting and augmenting NER training data.☆58Updated last year
- Discourse Analysis Tool Suite☆18Updated this week
- Named entity recognition for the legal domain☆41Updated 3 years ago
- Preprocessing and analysis for training SNOMED-CT concept embeddings from CORD-19 corpus☆14Updated last year