opensemanticsearch / open-semantic-searchLinks
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, …
☆1,046Updated 2 months ago
Alternatives and similar repositories for open-semantic-search
Users that are interested in open-semantic-search are comparing it to the libraries listed below
Sorting:
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆268Updated 2 years ago
- Carrot2: Text Clustering Algorithms and Applications☆814Updated last month
- Information Integration Tool☆600Updated 2 months ago
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.☆450Updated last year
- Websites crawler with built-in exploration and control web interface☆354Updated this week
- A self-hosted search engine for documents.☆638Updated this week
- Heuristic based boilerplate removal tool☆785Updated 4 months ago
- Data model and processing tools for investigative entity data☆236Updated this week
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆322Updated last year
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations a…☆99Updated 2 years ago
- Textricator is a tool to extract text from documents and generate structured data.☆345Updated 3 months ago
- Open Semantic Visual Linked Data Graph Explorer: Open Source tool (web app) and user interace (UI) for discovery, exploration and visuali…☆82Updated 5 years ago
- A curated list of ontology things☆388Updated last month
- Entity resolution for Elasticsearch.☆160Updated 5 months ago
- A curated list of resources for graph databases and graph computing tools☆1,217Updated 2 years ago
- PDF to XML ALTO file converter☆244Updated 2 weeks ago
- An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed…☆150Updated last week
- Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.☆231Updated last week
- Blazegraph High Performance Graph Database☆942Updated 2 years ago
- Content ExtRactor and MINEr☆497Updated 3 years ago
- ACHE is a web crawler for domain-specific search.☆469Updated last year
- A machine learning tool for fishing entities☆264Updated last month
- Social Feed Manager user interface application.☆155Updated last year
- News crawling with StormCrawler - stores content as WARC☆350Updated 4 months ago
- Just the facts -- web page content extraction☆1,268Updated 11 months ago
- Javascript scraping module based on puppeteer for many different search engines...☆561Updated 2 years ago
- Open Source REST API for named entity extraction, named entity linking, named entity disambiguation, recommendation & reconciliation of e…☆194Updated 2 years ago
- The low-code Knowledge Graph application platform. Apache license.☆542Updated this week
- Streaming WARC/ARC library for fast web archive IO☆416Updated 6 months ago
- brozzler - distributed browser-based web crawler☆720Updated 2 weeks ago