aphp / edspdf
EDS-PDF is a generic, pure-Python framework for text extraction from PDF documents. It provides the machinery to use rule- or machine-learning-based approaches to classify text blocs between body and meta-data.
☆46Updated last month
Alternatives and similar repositories for edspdf:
Users that are interested in edspdf are comparing it to the libraries listed below
- Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.☆121Updated last week
- EDS-Pseudo is a hybrid model for detecting personally identifying entities in clinical reports☆52Updated this week
- Confit is a complete and easy-to-use configuration framework aimed at improving the reproducibility of experiments by relying on the Pyth…☆11Updated last week
- eds-scikit is a Python library providing tools to process and analyse OMOP data☆39Updated 3 months ago
- Annotator building tool for Jupyter☆21Updated last month
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59Updated 10 months ago
- A Serverless Text Annotation Tool for Corpus Development☆55Updated last month
- A spaCy custom component that extracts and normalizes temporal expressions☆54Updated 2 years ago
- An easy-to-use API for analyzing INCEpTION annotation projects.☆17Updated last year
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Updated 11 months ago
- Preprocessing and analysis for training SNOMED-CT concept embeddings from CORD-19 corpus☆14Updated last year
- Spacy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, laboratory results)☆54Updated 2 years ago
- The CleanCoNLL dataset from our EMNLP 2023 paper where we corrected annotation errors and inconsistencies in CoNLL-03.☆23Updated 8 months ago
- XAI based human-in-the-loop framework for automatic rule-learning.☆48Updated 8 months ago
- PyTorch extension for handling deeply nested sequences of variable length☆10Updated 3 months ago
- This repository is now archived. Further development has been moved to https://github.com/medkit-lib/medkit.☆23Updated last year
- Python package for deduplication/entity resolution using active learning☆76Updated 7 months ago
- A Python library to de-identify medical records with state-of-the-art NLP methods.☆128Updated last year
- Aim-spaCy integration☆34Updated last year
- communication sur le moteur de pseudonymisation de la Cour de Cassation☆18Updated 2 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- Tools for interactive visual exploration of semantic embeddings.☆32Updated 6 months ago
- 🧪 Cutting-edge experimental spaCy components and features☆98Updated 11 months ago
- Sentence tokenizer for clinical/medical text.☆27Updated 9 months ago
- NLP @ TU Wien☆17Updated 3 months ago
- Source code and data for Like a Good Nearest Neighbor☆28Updated 2 months ago
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.☆15Updated 7 months ago
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- spaCy pipeline component for generating spaCy KnowledgeBase Alias Candidates for Entity Linking☆85Updated 2 years ago
- A tool for quickly adding labels to unlabeled datasets☆20Updated last year