dsc / guess-language
Attempts to determine the natural language of a selection of Unicode (utf-8) text (a clone of http://code.google.com/p/guess-language with package metadata)
☆48Updated 15 years ago
Alternatives and similar repositories for guess-language:
Users that are interested in guess-language are comparing it to the libraries listed below
- Lightweight, multilingual natural language processing☆63Updated 11 years ago
- C++ Ternary Search Tree implementation with Python bindings☆43Updated 7 years ago
- Python interface to IMDb plain-text data files☆41Updated 7 years ago
- A tool to segment text based on frequencies and the Viterbi algorithm "#TheBoyWhoLived" => ['#', 'The', 'Boy', 'Who', 'Lived']☆82Updated 8 years ago
- ... just because nltk is too heavy☆35Updated 14 years ago
- clone of https://code.google.com/p/splitta/ so it can be a git submodule☆34Updated 11 years ago
- Pipeline for distributed Natural Language Processing, made in Python☆65Updated 8 years ago
- Aelius is a suite of Python, NLTK-based modules and language data for training and evaluating POS-taggers for Brazilian Portuguese and an…☆19Updated 13 years ago
- Memory-based shallow parser for Python☆74Updated 5 years ago
- rapid nlp prototyping☆71Updated 2 years ago
- Updates to Zope's keyphrase extractor (forked from 1.1.0)☆67Updated 7 years ago
- This repository contains tool and collections dataset for detecting off-topic pages from Web archived collections.☆18Updated 9 years ago
- Python's missing statistical Swiss Army knife☆15Updated 9 years ago
- Twitter crawler☆11Updated 10 years ago
- A platform for storing large semantic networks on MongoDB☆22Updated 13 years ago
- AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available dataset…☆37Updated 13 years ago
- mediawiki parser library☆104Updated last week
- Python wrapper for the Readability API.☆134Updated 3 years ago
- This is a mirror of the script by Giuseppe Attardi, and contains history before the official repo started: https://github.com/attardi/wik…☆259Updated 8 years ago
- DEPRECATED - name_tools for Open States and other projects☆19Updated 5 years ago
- Stylometric framework in Python☆17Updated 9 years ago
- TweeQL is a Query Language for Tweets: SELECT brand(text) AS brand, sentiment(text) AS sentiment FROM twitter_sample;☆193Updated 10 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Updated 8 years ago
- Some convenient natural language tools that build on NLTK.☆85Updated 10 years ago
- A simple and fast search engine☆70Updated 2 years ago
- common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text☆35Updated 8 years ago
- Let's bring Readability to Chrome!☆210Updated 7 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 10 years ago
- Intro to some NLP concepts in Python for a class☆96Updated 10 years ago
- Python bindings for CLD2.☆16Updated 6 years ago