wardbradt / HTMLST
A library to extract sentences from HTML
☆11Updated 4 years ago
Alternatives and similar repositories for HTMLST:
Users that are interested in HTMLST are comparing it to the libraries listed below
- Extended Data Structure Features in Python☆14Updated 6 years ago
- A library to analyze text in relation to well-known novels from Project Gutenberg.☆10Updated 6 years ago
- Helpful resources and public-facing website for course The Open Source Movement☆16Updated last year
- Javascript Selection Object Library☆13Updated 7 years ago
- A Python library which facilitates the processing of images with a uniform layout (e.g. calendars, schedules, etc.) and inputting their d…☆13Updated 7 years ago
- Python analytics library that analyzes data on your websites and provides recommendations. Built on Google Analytics API. 📈📉🚀☆15Updated 7 years ago
- A platform to evaluate the ideological biases of the web.☆8Updated 2 years ago
- A fork of boilerpipe with python 3 and small fixes, ported from source `https://pypi.python.org/pypi/boilerpipe-py3.☆45Updated 5 years ago
- Python wrapper library for the Datamuse API☆78Updated 2 years ago
- English grammar checker code☆43Updated 11 years ago
- A Javascript module to scrape, analyze, and cache Congressional bills☆35Updated 2 years ago
- Python library, which task is to identify and disambiguate acronyms and abbreviation in text.☆23Updated 9 years ago
- Uses NLP and wikipedia to try to generate trivia questions☆133Updated 7 years ago
- project implementation and codes for finding who wrote the given texts (using NLP)☆23Updated 5 years ago
- Quickly extract multi-word phrases from a corpus☆191Updated 4 years ago
- Automatic News Corpus Builder☆40Updated 7 years ago
- Python port of Mikolov's word2phrase.c from the word2vec toolkit☆111Updated 5 years ago
- High-coverage and high-precision lexica of terms annotated with emotion scores for English and Italian.☆152Updated 5 months ago
- Recipe for Spanish POS tagging using the CESS corpus with NLTK☆18Updated 8 years ago
- Python interface to Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages☆543Updated 3 years ago
- Genderizer is a language independent module which tries to detect gender by looking given first names and/or analyzing sample texts.☆65Updated 10 years ago
- Language detection extension for spaCy 2.0+☆112Updated 6 years ago
- Package for performing Reddit-based text analysis☆21Updated 6 years ago
- HackDelft☆81Updated 7 years ago
- A simple interface for the CMU pronouncing dictionary☆311Updated 7 months ago
- Web page segmentation and noise removal☆55Updated last year
- A Python project inspired by the research of Chloé Kiddon and Yuriy Brun. Part of the Funniest Computer Ever Open Source initiative☆57Updated 6 years ago
- a collection of functions that measure the readability of a given body of text☆191Updated 7 years ago
- Python interface to the Stanford Named Entity Recognizer☆292Updated 3 years ago
- Making sense embedding out of word embeddings using graph-based word sense induction☆213Updated 3 years ago