opensemanticsearch / open-semantic-searchLinks
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, …
☆1,053Updated 3 months ago
Alternatives and similar repositories for open-semantic-search
Users that are interested in open-semantic-search are comparing it to the libraries listed below
Sorting:
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆269Updated 2 years ago
- Carrot2: Text Clustering Algorithms and Applications☆814Updated last week
- Textricator is a tool to extract text from documents and generate structured data.☆347Updated 4 months ago
- Open-source Enterprise Grade Search Engine Software☆508Updated 2 years ago
- Language, Knowledge, Cognition☆610Updated last month
- A self-hosted search engine for documents. Fill our user survey about structured content: : https://forms.gle/PYgusFsoBaMyzUec9☆642Updated last week
- Index Common Crawl archives in tabular format☆123Updated 2 months ago
- Websites crawler with built-in exploration and control web interface☆357Updated last week
- Just the facts -- web page content extraction☆1,270Updated last week
- PDF to XML ALTO file converter☆247Updated last week
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations a…☆99Updated 2 years ago
- Judgment citation annotations for the National Archives Find Case Law service☆22Updated this week
- Core Python Web Archiving Toolkit for replay and recording of web archives☆1,532Updated 2 months ago
- ACHE is a web crawler for domain-specific search.☆468Updated last year
- INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.☆645Updated this week
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆323Updated last year
- Streaming WARC/ARC library for fast web archive IO☆422Updated 7 months ago
- A list of memex-related tools and their repository URLs☆151Updated 7 years ago
- Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.☆232Updated last week
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆291Updated last month
- Run a high-fidelity browser-based web archiving crawler in a single Docker container☆831Updated this week
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pac…☆285Updated 2 months ago
- Information Integration Tool☆600Updated 3 months ago
- Open Source REST API for named entity extraction, named entity linking, named entity disambiguation, recommendation & reconciliation of e…☆195Updated 2 years ago
- The software used to extract structured data from Wikipedia☆901Updated 5 months ago
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.☆117Updated last week
- Heuristic based boilerplate removal tool☆786Updated 4 months ago
- brozzler - distributed browser-based web crawler☆724Updated this week
- A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/☆195Updated 6 years ago
- News crawling with StormCrawler - stores content as WARC☆351Updated 5 months ago