duhaime / detect_reuseLinks

Python utilities for detecting textual reuse

☆21

Alternatives and similar repositories for detect_reuse

Users that are interested in detect_reuse are comparing it to the libraries listed below

Sorting:

willf / segment
A tool to segment text based on frequencies and the Viterbi algorithm "#TheBoyWhoLived" => ['#', 'The', 'Boy', 'Who', 'Lived']
☆81Updated 9 years ago
mdenil / txtnets
A convolutional neural network library for NLP.
☆59Updated 7 years ago
semanticize / semanticizest
Standalone Semanticizer
☆32Updated 10 years ago
Sentimentron / Dracula
A deep, LSTM-based part of speech tagger and sentiment analyser using character embeddings instead of words. Compatible with Theano and T…
☆92Updated 8 years ago
jimmycallin / pydsm
A Python framework for exploring distributional semantic models.
☆85Updated 9 years ago
jwieting / paragram-word
Python code for training Paragram word embeddings. These achieve human-level performance on some word similiarty tasks including SimLex-9…
☆30Updated 9 years ago
gouwsmeister / TextCleanser
Normalizes lexically ill-formed text to its most likely clean text, e.g. "c u thr 2nite!" -> "see you there tonight!".
☆63Updated 10 years ago
npow / RNN-EM
Recurrent Neural Networks with External Memory
☆30Updated 10 years ago
rspeer / text-as-data
A PyData 2013 talk on straightforward, data-driven ways to handle natural language text in Python.
☆51Updated 11 years ago
matpalm / rnn_lm
various simple RNNs trained on synthetic grammars
☆30Updated 10 years ago
leondz / entity_recognition
framework for doing NER and other types of entity recognition, in Python
☆68Updated 3 years ago
jaredks / tweetokenize
Tokenization and pre-processing for Twitter data used to train classifiers.
☆72Updated 9 years ago
clips / topbox
Python 2 & 3 wrapper around the Stanford Topic Modeling Toolbox. Intended to be used for hassle-free supervised topic classification with…
☆58Updated 7 years ago
adamfabish / Reduction
Reduction is a python script which automatically summarizes a text by extracting the sentences which are deemed to be most important.
☆54Updated 10 years ago
interrogator / corpkit
A toolkit for corpus linguistics
☆206Updated 6 years ago
turian / pytextpreprocess
Preprocess text for NLP (tokenizing, lowercasing, stemming, sentence splitting, etc.)
☆29Updated 14 years ago
evanmiltenburg / dm-graphs
Scripts to explore and visualize distributional semantic models using graphs.
☆24Updated 8 years ago
mbartoli / deep-simplification
Text simplification using RNNs
☆55Updated 9 years ago
turian / random-indexing-wordrepresentations
Induce word representations using random indexing (RI)
☆29Updated 15 years ago
xtannier / WebAnnotator
WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…
☆48Updated 3 years ago
WladimirSidorenko / DiscourseSenser
Sense Disambiguation of Connectives for PDTB-Style Discourse Parsing
☆14Updated 8 years ago
NNBlocks / NNBlocks
A framework to build and train linguistics neural models
☆19Updated 9 years ago
magsilva / dtm-old
Dynamic Topic Model (based upon code released by David Blei at http://www.cs.princeton.edu/~blei/topicmodeling.html)
☆31Updated 7 years ago
senarvi / theanolm
TheanoLM is a recurrent neural network language modeling tool implemented using Theano
☆81Updated last year
semanticize / st
Semanticizest: dump parser and client
☆20Updated 9 years ago
ethancaballero / Skip-Thought_Memory_Networks
Question Answering system based on Skip-Thought Memory Networks
☆17Updated 5 years ago
arnicas / word2vec-pride-vis
A hack to replace Pride & Prejudice text with closest word2vec model word, and visualize results.
☆61Updated 10 years ago
genekogan / text-learning
language + text generation + summarization using Keras and Sumy
☆44Updated 10 years ago
salmedina / pdf2thumb
This little program generates a thumbnail of a certain pdf for quick visualization. It is based on ImageMagick as it has all the function…
☆17Updated 3 years ago
jayantj / w2vec-similarity
Scripts and modules used for creating document clusters from word2vec
☆40Updated 8 years ago