aphp / edspdf
EDS-PDF is a generic, pure-Python framework for text extraction from PDF documents. It provides the machinery to use rule- or machine-learning-based approaches to classify text blocs between body and meta-data.
☆41Updated this week
Related projects ⓘ
Alternatives and complementary repositories for edspdf
- Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.☆115Updated this week
- EDS-Pseudo is a hybrid model for detecting personally identifying entities in clinical reports☆47Updated last week
- eds-scikit is a Python library providing tools to process and analyse OMOP data☆36Updated 4 months ago
- Annotator building tool for Jupyter☆21Updated 9 months ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆57Updated 6 months ago
- Python package for deduplication/entity resolution using active learning☆78Updated 2 months ago
- Confit is a complete and easy-to-use configuration framework aimed at improving the reproducibility of experiments by relying on the Pyth…☆12Updated this week
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆62Updated 8 months ago
- 🧪 Cutting-edge experimental spaCy components and features☆95Updated 6 months ago
- communication sur le moteur de pseudonymisation de la Cour de Cassation☆18Updated last year
- Preprocessing and analysis for training SNOMED-CT concept embeddings from CORD-19 corpus☆14Updated last year
- 🤗 Push your spaCy pipelines to the Hugging Face Hub☆43Updated 5 months ago
- A Serverless Text Annotation Tool for Corpus Development☆54Updated last month
- A Python library aimed at dissecting and augmenting NER training data.☆56Updated last year
- Platform enabling Rapid Annotation for Clinical Entity Recognition☆50Updated 2 years ago
- Annotation Management for Prodigy, that support multiple users working in many projects☆15Updated 6 years ago
- A Python library for creating adversarial splits☆13Updated 2 years ago
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Updated 7 months ago
- Finds linguistic patterns effortlessly☆33Updated last year
- A PyPI package for easy text annotation in a Jupyter Notebook.☆28Updated 3 years ago
- A spaCy custom component that extracts and normalizes temporal expressions☆52Updated last year
- The CleanCoNLL dataset from our EMNLP 2023 paper where we corrected annotation errors and inconsistencies in CoNLL-03.☆19Updated 4 months ago
- A Python library to de-identify medical records with state-of-the-art NLP methods.☆120Updated last year
- A Streamlit component for annotating text by text selecting.☆40Updated 5 months ago
- ✖️MEN - A Modular Toolkit for Cross-Lingual Medical Entity Normalization☆23Updated 2 weeks ago
- GPTNERMED is a language model-generated, synthetic dataset and an open neural NER model for medical entities designed for German data.☆15Updated last year
- An exploratory, tutorial and analytical view of the Unified Medical Language System (UMLS) & the software/technologies provided via being…☆36Updated 8 months ago
- A tool for quickly adding labels to unlabeled datasets☆20Updated 10 months ago
- spaCy match and replace, maintaining conjugation☆34Updated last year
- This repository is now archived. Further development has been moved to https://github.com/medkit-lib/medkit.☆24Updated last year