moj-analytical-services / airflow-pdf2embeddingsLinks
NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.
☆36Updated 3 years ago
Alternatives and similar repositories for airflow-pdf2embeddings
Users that are interested in airflow-pdf2embeddings are comparing it to the libraries listed below
Sorting:
- PatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multip…☆109Updated 2 months ago
- A basic tool that extracts the structure from the PDF files of scientific articles.☆76Updated 3 years ago
- Using Natural Language Processing to standardize Company Names☆11Updated 4 years ago
- spaCy powered Label Studio ML backend☆31Updated 2 years ago
- Text analysis with networks.☆290Updated last week
- 🏖TagEditor - Annotation tool for spaCy☆193Updated 3 years ago
- Fuzzy matching and more functionality for spaCy.☆259Updated last year
- The dataset used to evaluate JobBERT on the task of job title normalization.☆27Updated 3 years ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆168Updated 3 years ago
- ☆55Updated last year
- Spacy NER annotator using ipywidgets☆123Updated last year
- Dataframe Integration with spaCy.☆103Updated 4 years ago
- German sentiment scores with SentiWS as extension for spaCy☆38Updated 2 years ago
- A list of selected resources, methods, and tools dedicated to Legal Text Analytics.☆685Updated last year
- Mastering spaCy, published by Packt☆136Updated last week
- Extracting Semi-Structured Data from PDFs on a large scale☆52Updated 3 years ago
- The WIPO Manual on Open Source Patent Analytics☆56Updated 3 years ago
- 📂 Additional lookup tables and data resources for spaCy☆112Updated 5 months ago
- This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-s…☆220Updated 10 months ago
- Nesta's Skills Extractor Library☆147Updated 5 months ago
- Information extraction from English and German texts based on predicate logic☆139Updated 2 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- A Flexible Deep Learning Approach to Fuzzy String Matching☆147Updated last year
- SpacyV3 Text Categorizer Tutorial☆17Updated 5 years ago
- Python client for EPO OPS, the European Patent Office's Open Patent Services API.☆170Updated last week
- ☆41Updated last year
- HDBSCAN Tuning for BERTopic Models☆49Updated 2 years ago
- A curated list of resources on document similarity measures (papers, tutorials, code, ...)☆253Updated 3 years ago
- A Dataset of German Legal Documents for Named Entity Recognition☆172Updated 3 years ago
- Scripts used to make and evaluate OpenAlex's concept tagging model☆52Updated 2 years ago