chrismattmann/tika-similarity

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/chrismattmann/tika-similarity)

chrismattmann / tika-similarity

Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.

☆108

Alternatives and similar repositories for tika-similarity

Users that are interested in tika-similarity are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

chrismattmann / etllib
View on GitHub
This is the ETL lib package. It provides an API to munge and prepare JSON, TSV and other data using Apache Tika and JSON parsing/loading …
☆18Jan 27, 2024Updated 2 years ago
nasa-jpl-memex / image_space
View on GitHub
Interactive Image similarity and Visual Search and Retrieval application
☆95Apr 16, 2024Updated 2 years ago
mccutchen / speculatively
View on GitHub
Package speculatively provides a simple mechanism to re-execute a task in parallel only after some initial timeout has elapsed.
☆10Jul 11, 2025Updated last year
thammegowda / tika-ner-corenlp
View on GitHub
Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser
☆13Feb 26, 2022Updated 4 years ago
USCDataScience / SentimentAnalysisParser
View on GitHub
Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.
☆34May 3, 2023Updated 3 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
chrismattmann / trec-dd-polar
View on GitHub
A dataset downloaded from the deep and scientific web across three major Polar data centers for use in research.
☆13Sep 8, 2017Updated 8 years ago
vlall / Moses-API
View on GitHub
Simple RESTful API server running your own machine translation model. Docker image modified from mbartoli/easy-smt
☆11Apr 28, 2019Updated 7 years ago
ericwhyne / darpa_open_catalog
View on GitHub
Meta information for the DARPA open catalog project.
☆57Nov 16, 2017Updated 8 years ago
sachinrjoglekar / MapGeist
View on GitHub
Exploring Text, Graphically
☆12Mar 27, 2015Updated 11 years ago
sematext / activate
View on GitHub
Examples for the Activate conference
☆11Sep 11, 2019Updated 6 years ago
nasa-jpl-memex / weapons
View on GitHub
MEMEX Weapons Pilot for the illegal weapons domain.
☆15May 20, 2016Updated 10 years ago
napsternxg / Kaggle-StackOverflow-Vis
View on GitHub
Submission for Stack Exchange Kaggle Visualization Competition
☆16Oct 7, 2015Updated 10 years ago
fedelemantuano / tika-app-python
View on GitHub
Python bindings for Apache Tika
☆24Aug 20, 2020Updated 5 years ago
nasa-jpl-memex / memex-explorer
View on GitHub
Viewers for statistics and dashboarding of Domain Search Engine data
☆128Jan 19, 2016Updated 10 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
sherlok / sherlok
View on GitHub
Distributed restful text mining.
☆23Jan 19, 2016Updated 10 years ago
Benjamin-Hu / Engineering-Drawing-Parser
View on GitHub
Engineering Drawing Parser
☆13Jan 24, 2019Updated 7 years ago
ContinuumIO / nutchpy
View on GitHub
For interacting with nutch via Python
☆29Jul 5, 2026Updated 2 weeks ago
generalmilk / DeepSentiBank
View on GitHub
deep version SentiBank
☆12Dec 16, 2014Updated 11 years ago
mitll / topic-clustering
View on GitHub
☆44Jan 15, 2016Updated 10 years ago
minhptx / iswc-2016-semantic-labeling
View on GitHub
☆11Apr 24, 2018Updated 8 years ago
momer / nutch-selenium-grid-plugin
View on GitHub
A Nutch 2.2.1 plugin which allows users to shuffle off the responsibility for retrieving pages to a selenium hub/node spoke system. This …
☆16Jun 9, 2016Updated 10 years ago
SteveJunGao / 3D_DeepLearning_Resources
View on GitHub
Resources for 3D Deep Learning
☆12Sep 7, 2017Updated 8 years ago
happylun / StyleSimilarity
View on GitHub
Source code for the Style Similarity project: measure style similarity between 3D shapes.
☆14Mar 10, 2020Updated 6 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
mitll / vizlinc
View on GitHub
Vizlinc
☆15Jan 14, 2016Updated 10 years ago
aphp / UimaOnSpark
View on GitHub
Way to run Uima Pipelines on Apache Spark
☆10Jul 19, 2021Updated 5 years ago
KBNLresearch / keyword-generator
View on GitHub
Command-line tool to extract a ranked list of relevant keywords from a corpus with the option of using either topic modeling or tf-idf sc…
☆41Feb 27, 2017Updated 9 years ago
ContinuumIO / pydata-apps
View on GitHub
Building Python Data Application Tutorials
☆24Jun 25, 2026Updated 3 weeks ago
javagl / ObjSamples
View on GitHub
Samples for the Obj library
☆15Feb 12, 2018Updated 8 years ago
nasa-jpl-memex / topic_space
View on GitHub
Topic modeling web application
☆40Jul 23, 2015Updated 10 years ago
aviralmathur / Word2Vec
View on GitHub
Find Cosine Similarity for Text Documents with Features developed from Word2Vec
☆13Aug 19, 2015Updated 10 years ago
USCDataScience / sparkler
View on GitHub
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
☆421Mar 30, 2023Updated 3 years ago
jamalshahverdiev / sendsmsviasmppapi
View on GitHub
Python code to send SMS via SMPP API for Nagios monitored servers
☆11Jul 29, 2017Updated 8 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
dbpedia-spotlight / wikipedia-stats-extractor
View on GitHub
Raw Wikipedia counts for entity linking
☆19May 19, 2017Updated 9 years ago
mrmilosz / hypersolid
View on GitHub
JavaScript library with examples for displaying and interacting with a 3D projection of a 4D solid (wireframe).
☆22Sep 1, 2018Updated 7 years ago
jprichardson / node-path-extra
View on GitHub
Node.js: extra methods for the path object.
☆24Sep 23, 2019Updated 6 years ago
malllabiisc / kg-geometry
View on GitHub
☆20Apr 25, 2021Updated 5 years ago
ubergrape / pycalais
View on GitHub
An OpenCalais API Interface for Python.
☆21Mar 13, 2012Updated 14 years ago
oaqa / suim
View on GitHub
Analytic UIMA pipelines using Spark
☆24Nov 27, 2015Updated 10 years ago
eisenjulian / messenger-bot-nlp
View on GitHub
A Facebook Messenger bot sample integrated with built-in NLP from wit.ai
☆13Apr 4, 2019Updated 7 years ago