JingheZ / TextMiningLinks

In this project, there are two major tasks: text data processing and text categorization. In text data processing, we have done tokenization, stemming, normalization, etc. Also, vector space model and statistical language models are used to retrieve similar documents to query. In text categorization, we build a text classification system which i…

☆8

Alternatives and similar repositories for TextMining

Users that are interested in TextMining are comparing it to the libraries listed below

Sorting:

sunishsheth2009 / ChatterBot
Uses Python, Flask, Natural Language processing, SQLAlchemy, NLTK and beautiful soup for web scrapping.
☆9Updated 4 years ago
elplatt / lda-gibbs-em
Latent Dirichlet Allocation with Gibbs sampling
☆16Updated 11 years ago
shenzhun / creating-enron-spam-corpus-from-raw-data
Using raw data of Enron spam datasets to create a corpus using python, nltk and shell script.
☆8Updated 11 years ago
pxnguyen / videotext
Text Detection and Recognition in Video
☆11Updated 11 years ago
turian / pytextpreprocess
Preprocess text for NLP (tokenizing, lowercasing, stemming, sentence splitting, etc.)
☆29Updated 14 years ago
ankazhao / python-sparselda
A Latent Dirichlet Allocation topic modeling package based on SparseLDA Gibbs Sampling inference algorithm
☆8Updated 12 years ago
ianozsvald / social_media_brand_disambiguator
Brand disambiguator for tweets to differentiate e.g. Orange vs orange (brand vs foodstuff), using NLTK and scikit-learn
☆57Updated 12 years ago
arne-cl / nltk-maxent-pos-tagger
maximum entropy based part-of-speech tagger for NLTK
☆45Updated 8 years ago
luchux / ipython-notebook-nltk
An introduction to Natural Language processing using NLTK with python.
☆19Updated 3 years ago
wdickers / Focused_Crawler
Focused Crawler for VT's CTRNet
☆10Updated 12 years ago
ryhan / NLP-project
11411 Natural Language Processing Final Project. Reads wikipedia articles, and then can both answer natural-language questions about the …
☆22Updated 12 years ago
knowitall / chunkedextractor
Extractors whose input is a chunked sentence. Includes Relnoun, Nesty, and a scala interface for ReVerb.
☆28Updated 7 years ago
chriskelvinlee / trivial_pursuit
Homebrew implementation of IBM Watson DeepQA (NLTK, Semantic Web, AI strategies)
☆16Updated 13 years ago
japerk / PyCon-NLTK-Tutorial
☆49Updated 13 years ago
rsennrich / SMORLemma
SMOR (Stuttgart Morphology) with alternative lemmatization component
☆12Updated last year
renepickhardt / generalized-language-modeling-toolkit
Generalized Language Modeling toolkit
☆51Updated 3 years ago
davidthaler / Greek_media
This repo holds the code for the 10th place entry in the 2014 WISE/Greek Media Multi-label Classification competition hosted on Kaggle.
☆13Updated 10 years ago
markhatton / google-ngrams
Shell scripts to assist downloading & processing the Google n-grams corpora
☆14Updated 8 years ago
aimannajjar / columbiau-rocchio-search-query-expander
Implements Rocchio Query Expansion - similar to "related searches:" found at popular search engines but based on relevant documents selec…
☆20Updated 8 years ago
gouwsmeister / TextCleanser
Normalizes lexically ill-formed text to its most likely clean text, e.g. "c u thr 2nite!" -> "see you there tonight!".
☆63Updated 9 years ago
aalto-speech / flatcat
Morfessor FlatCat
☆13Updated 5 years ago
blei-lab / turbotopics
Turbo topics find significant multiword phrases in topics.
☆46Updated 10 years ago
wpm / Naive-Bayes-Gibbs-Sampler
Gibbs sampler for for a Naive Bayes document classifier
☆24Updated 12 years ago
ethnhll / FilippovaCompression
Implementation of the algorithm described in "Multi-sentence compression: Finding shortest paths in word graphs" by Katja Filippova.
☆12Updated 10 years ago
3003 / Text-Retrieval-Python
Collection of functions and scripts for text retrieval in Python: Document collection preprocessing, Feature Selection, Indexing, Query p…
☆43Updated 12 years ago
zygmuntz / classifying-text
Classifying text with bag-of-words
☆113Updated 10 years ago
rsennrich / zmorge
Zurich Morphological Lexicon for German: a tool to extract a morphological lexicon from Wiktionary
☆11Updated last year
bohana / sentlex
Tools and Libraries for Lexicon-Based Sentiment Analysis
☆24Updated 8 years ago
max99x / crystal
Natural Language Q/A app using DRT.
☆34Updated 14 years ago
erickrf / ptwiki2text
Python scripts to read a Portuguese Wikipedia XML dump file, parse it and generate plain text files.
☆14Updated 11 years ago