gr33ndata / dmoz-urlclassifier
Preparing DMOZ dataset for my n-Gram LM-based URL classification research
☆32Updated 10 years ago
Alternatives and similar repositories for dmoz-urlclassifier:
Users that are interested in dmoz-urlclassifier are comparing it to the libraries listed below
- Algorithms for URL Classification☆19Updated 9 years ago
- Query-Document Relevance☆42Updated 10 years ago
- Python wrapper for Apache OpenNLP tools☆34Updated 8 years ago
- CogComp's light-weight Python NLP annotators☆115Updated 5 years ago
- ☆21Updated 8 years ago
- A pyLucene-based search module for searching books from goodreads.com☆26Updated 7 years ago
- A project for clustering text streams using locality-sensitive hashing (LSH) in Python☆26Updated 13 years ago
- [NO LONGER MAINTAINED AS OPEN SOURCE - USE SCALETEXT.COM INSTEAD]☆108Updated 11 years ago
- Knowledge extraction from web data☆92Updated 6 years ago
- Wikipedia API wrapper for humans and elk. (en.wikipedia.org/w/api.php, get it?)☆36Updated 10 years ago
- Reduction is a python script which automatically summarizes a text by extracting the sentences which are deemed to be most important.☆55Updated 9 years ago
- Detecting near duplicates usign Moses Charikars Algorithm☆20Updated 10 years ago
- A Python module to fetch and parse results from different search engines.☆77Updated 6 years ago
- Python wrapper around SVDLIBC, a fast library for sparse Singular Value Decomposition☆55Updated 11 years ago
- This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet…☆29Updated last month
- Active Learning for text classification using scikit-learn☆23Updated 5 years ago
- Non-Overlapping Aho-Corasick Python extension, for Python 2 (str and unicode) and Python 3☆51Updated 9 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Updated 7 years ago
- An Apache Lucene TokenFilter that uses a word2vec vectors for term expansion.☆24Updated 10 years ago
- Labeled examples from wiki dumps in Python☆67Updated 8 years ago
- Simple factoid question answering system☆23Updated 9 years ago
- Python bindings for libwapiti☆66Updated 5 years ago
- A pure python implementation of locality sensitive hashing for text documents☆86Updated 9 years ago
- Repository for the CLiPS HAte speech DEtection System [HADES].☆24Updated 6 years ago
- An open relation extraction system☆46Updated 3 years ago
- SVO extraction using NLTK☆37Updated 5 years ago
- Python search module for fast approximate string matching☆54Updated 2 years ago
- Find which links on a web page are pagination links☆29Updated 8 years ago
- Train a gensim word2vec model on Wikipedia.☆75Updated 6 years ago
- A thin wrapper around the DBPedia Spotlight REST API☆59Updated 8 months ago