opensemanticsearch / open-semantic-search
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, …
☆1,014Updated 2 years ago
Alternatives and similar repositories for open-semantic-search:
Users that are interested in open-semantic-search are comparing it to the libraries listed below
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆266Updated 2 years ago
- Carrot2: Text Clustering Algorithms and Applications☆799Updated last month
- Open Semantic Visual Linked Data Graph Explorer: Open Source tool (web app) and user interace (UI) for discovery, exploration and visuali…☆84Updated 5 years ago
- Open Source REST API for named entity extraction, named entity linking, named entity disambiguation, recommendation & reconciliation of e…☆194Updated 2 years ago
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆314Updated last year
- Textricator is a tool to extract text from documents and generate structured data.☆347Updated 2 weeks ago
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations a…☆97Updated 2 years ago
- The software used to extract structured data from Wikipedia☆891Updated last month
- LexNLP by LexPredict☆715Updated 10 months ago
- Just the facts -- web page content extraction☆1,262Updated 8 months ago
- brozzler - distributed browser-based web crawler☆693Updated last week
- PDF to XML ALTO file converter☆233Updated 2 weeks ago
- LexPredict ContraxSuite☆168Updated 2 years ago
- LexPredict Legal Dictionaries☆115Updated 2 years ago
- YAGO is a large semantic knowledge base, derived from Wikipedia, WordNet, WikiData, GeoNames, and other data sources☆734Updated 2 years ago
- Run a high-fidelity browser-based web archiving crawler in a single Docker container☆726Updated this week
- Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.☆217Updated this week
- Heuristic based boilerplate removal tool☆764Updated last month
- A cross-platform command line tool for parallelised content extraction and analysis.☆243Updated this week
- Core Python Web Archiving Toolkit for replay and recording of web archives☆1,475Updated this week
- ACHE is a web crawler for domain-specific search.☆464Updated last year
- Judgment citation annotations for the National Archives Find Case Law service☆22Updated this week
- Ambar: Document Search Engine☆1,948Updated 3 years ago
- Blazegraph High Performance Graph Database☆921Updated last year
- NBoost is a scalable, search-api-boosting platform for deploying transformer models to improve the relevance of search results on differe…☆678Updated 4 years ago
- Science-parse version 2☆240Updated 5 years ago
- Streaming WARC/ARC library for fast web archive IO☆408Updated 3 months ago
- A self-hosted search engine for documents.☆623Updated this week
- An open database of international sanctions data, persons of interest and politically exposed persons☆546Updated this week
- Science Parse parses scientific papers (in PDF form) and returns them in structured form.☆652Updated 10 months ago