moj-analytical-services / airflow-pdf2embeddings
NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.
β36Updated 2 years ago
Related projects β
Alternatives and complementary repositories for airflow-pdf2embeddings
- π₯ Use Hugging Face text and token classification pipelines directly in spaCyβ62Updated 8 months ago
- Extracting Semi-Structured Data from PDFs on a large scaleβ51Updated 2 years ago
- β53Updated 10 months ago
- spaCy pipeline component for generating spaCy KnowledgeBase Alias Candidates for Entity Linkingβ85Updated 2 years ago
- Semantic Scholar's Author Disambiguation Algorithm & Evaluation Suiteβ90Updated 10 months ago
- Tools for interactive visual exploration of semantic embeddings.β29Updated 2 months ago
- Python based Wikidata framework for easy dataframe extractionβ39Updated 11 months ago
- spaCy powered Label Studio ML backendβ30Updated last year
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidataβ153Updated 2 years ago
- Summarize. is a Streamlit application that performs automatic text summarization using both extractive and abstractive models.β15Updated 3 years ago
- A basic tool that extracts the structure from the PDF files of scientific articles.β74Updated 2 years ago
- The official tool for transforming doccano format into common dataset formats.β105Updated last year
- Easy PDF to text to spaCy text extraction in Python.β34Updated last month
- link raw affiliation to ROR idsβ25Updated last year
- A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-Domain Evaluation Framework for Academic Documentsβ19Updated last year
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporatedβ¦β25Updated 2 years ago
- β15Updated 3 years ago
- A Flexible Deep Learning Approach to Fuzzy String Matchingβ139Updated last month
- Discourse Analysis Tool Suiteβ17Updated this week
- Using Natural Language Processing to standardize Company Namesβ12Updated 3 years ago
- Metadata Extractor & Loader (MEL) β The NLP-NER Toolkit (TNNT)β22Updated last year
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidataβ91Updated last year
- π€ Push your spaCy pipelines to the Hugging Face Hubβ43Updated 5 months ago
- Data and additional information regarding the paper: Contract Discovery. Dataset and a Few-Shot Semantic Retrieval Challenge with Competiβ¦β29Updated 4 years ago
- A library of tools for dictionary-based Named Entity Recognition (NER), based on word vector representations to expand dictionary terms.β24Updated last year
- HDBSCAN Tuning for BERTopic Modelsβ42Updated last year
- A toolkit for automatically extracting semantic information from PDF files of scientific articlesβ65Updated 11 months ago
- Open Access PDF harvester, metadata aggregator and full-text ingesterβ55Updated 6 months ago