opensemanticsearch / open-semantic-search
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, …
☆1,030Updated 3 weeks ago
Alternatives and similar repositories for open-semantic-search
Users that are interested in open-semantic-search are comparing it to the libraries listed below
Sorting:
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆268Updated 2 years ago
- Ambar: Document Search Engine☆1,950Updated 3 years ago
- Open-source Enterprise Grade Search Engine Software☆507Updated 2 years ago
- Just the facts -- web page content extraction☆1,265Updated 10 months ago
- Open Source REST API for named entity extraction, named entity linking, named entity disambiguation, recommendation & reconciliation of e…☆194Updated 2 years ago
- Improve your OpenSearch, Elasticsearch, Solr, Vectara, Algolia and Custom Search search quality.☆304Updated this week
- Carrot2: Text Clustering Algorithms and Applications☆805Updated 2 months ago
- Language, Knowledge, Cognition☆603Updated 2 months ago
- YAGO is a large semantic knowledge base, derived from Wikipedia, WordNet, WikiData, GeoNames, and other data sources☆736Updated 2 years ago
- A cross-platform command line tool for parallelised content extraction and analysis.☆245Updated this week
- The software used to extract structured data from Wikipedia☆897Updated 2 months ago
- Content ExtRactor and MINEr☆493Updated 2 years ago
- Open Semantic Visual Linked Data Graph Explorer: Open Source tool (web app) and user interace (UI) for discovery, exploration and visuali…☆84Updated 5 years ago
- NBoost is a scalable, search-api-boosting platform for deploying transformer models to improve the relevance of search results on differe…☆676Updated 4 years ago
- A machine learning tool for fishing entities☆264Updated this week
- Science Parse parses scientific papers (in PDF form) and returns them in structured form.☆655Updated 11 months ago
- PDF to XML ALTO file converter☆238Updated this week
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆316Updated last year
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.☆444Updated last year
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations a…☆97Updated 2 years ago
- LexNLP by LexPredict☆721Updated 11 months ago
- A set of tools to allow PDF to XML conversion, utilising Apache Beam and other tools. The aim of this project is to bring multiple tools…☆295Updated 3 years ago
- 🦙 Integrating LLMs into structured NLP pipelines☆1,245Updated 4 months ago
- Blazegraph High Performance Graph Database☆933Updated 2 years ago
- Article extraction benchmark: dataset and evaluation scripts☆315Updated last year
- brozzler - distributed browser-based web crawler☆708Updated this week
- ☆339Updated 2 years ago
- A knowledge base construction engine for richly formatted data☆410Updated 3 years ago
- Python implementation of TextRank algorithms ("textgraphs") for phrase extraction☆2,174Updated 10 months ago
- Textricator is a tool to extract text from documents and generate structured data.☆346Updated 2 months ago