moj-analytical-services / airflow-pdf2embeddingsLinks
NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.
☆36Updated 3 years ago
Alternatives and similar repositories for airflow-pdf2embeddings
Users that are interested in airflow-pdf2embeddings are comparing it to the libraries listed below
Sorting:
- A basic tool that extracts the structure from the PDF files of scientific articles.☆75Updated 3 years ago
- A Flexible Deep Learning Approach to Fuzzy String Matching☆148Updated 11 months ago
- Python client for GROBID Web services☆364Updated last week
- Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.☆237Updated last week
- Spacy NER annotator using ipywidgets☆122Updated last year
- A simple toolkit for conducting analyses using corpus methods☆26Updated 3 years ago
- Fuzzy matching and more functionality for spaCy.☆258Updated last year
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆164Updated 2 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- spaCy powered Label Studio ML backend☆31Updated 2 years ago
- Using Natural Language Processing to standardize Company Names☆12Updated 4 years ago
- 🏖TagEditor - Annotation tool for spaCy☆193Updated 3 years ago
- Python library for the OpenAlex HTTP API☆23Updated 2 years ago
- Find legal citations in any block of text☆174Updated last week
- Finding mentions and citations to named and implicit research datasets from within the academic literature☆29Updated 3 months ago
- Scripts used to make and evaluate OpenAlex's concept tagging model☆50Updated 2 years ago
- Semantic Scholar's Author Disambiguation Algorithm & Evaluation Suite☆95Updated last month
- Open Access PDF harvester, metadata aggregator and full-text ingester☆63Updated last year
- Text analysis with networks.☆288Updated 2 weeks ago
- Dataframe Integration with spaCy.☆103Updated 4 years ago
- link raw affiliation to ROR ids☆30Updated 2 years ago
- A Dataset of German Legal Documents for Named Entity Recognition☆172Updated 2 years ago
- Service for converting and enhancing heterogeneous publisher XML formats into TEI☆57Updated last year
- LexPredict Legal Dictionaries☆127Updated 3 years ago
- ☆55Updated last year
- spaCy extension for Visual Studio Code☆32Updated 7 months ago
- Library for unit extraction - fork of quantulum for python3☆142Updated last year
- A Python library for calculating a large variety of metrics from text☆350Updated 9 months ago
- Scripts for Medium articles☆62Updated last year
- HDBSCAN Tuning for BERTopic Models☆49Updated 2 years ago