dohliam / more-stoplists
stoplists for African languages generated from the ASP corpus
☆14Updated 9 years ago
Alternatives and similar repositories for more-stoplists:
Users that are interested in more-stoplists are comparing it to the libraries listed below
- List of (possible) English hedge words☆45Updated 2 years ago
- An offline/online field database which adapts to its user's terminology and I-Language. http://fielddb.github.io☆79Updated 2 years ago
- This repository contains tool and collections dataset for detecting off-topic pages from Web archived collections.☆18Updated 9 years ago
- An online reference for data journalism☆25Updated 10 years ago
- Collections of english historical texts and data relating to them☆18Updated 3 years ago
- Formula to detect the ease of reading a text according to the Coleman-Liau index (1975)☆14Updated 2 years ago
- Basic dataset for the linguistic data collection.☆15Updated 8 years ago
- automate incrementally producing word pronunciation recordings for Wiktionary through Wikimedia Commons☆22Updated 6 years ago
- generate rules from lists of words☆16Updated 3 years ago
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- Web hub based on Wikidata☆36Updated 2 years ago
- sci.pe (science periodicals) extension of schema:ScholarlyArticle to describe the production process, content, distribution and preser…☆4Updated 2 years ago
- download and process d3.js blocks for further indexing and visualization☆24Updated 5 years ago
- A Knowledge Base for research software relying on large-scale text mining and curated knowledge sources☆16Updated last year
- LightSide Workbench☆24Updated last year
- A list of words from the SUBTLEX movie subtitles corpus, sorted by frequency.☆33Updated 5 years ago
- Examples of bad data, especially from government.☆23Updated 7 months ago
- Formula to detect the ease of reading a text according to the SMOG (Simple Measure of Gobbledygook) formula (1969)☆18Updated 2 years ago
- a framework and language for exploring and analyzing feeds of social media data.☆23Updated 13 years ago
- bigram / trigram analysis of wikipedia; mainly mutual info☆22Updated 13 years ago
- Open Access PDF harvester☆39Updated 10 months ago
- A web-based, token-level annotation tool for non-standard language data☆10Updated 4 years ago
- “Open terminals”, “load CSVs”, “start hacking”☆15Updated 7 years ago
- Experiments to help discussion on Wikipedia talk pages☆66Updated 4 months ago
- JSON datasets for powering 'intelligent' spell and grammar checkers☆33Updated 14 years ago
- Stanford Tregex-inspired language for rule-based dependency tree manipulation.☆21Updated 7 years ago
- Markdown -> IPython conversion tool☆15Updated 10 years ago
- List of (possible) English weasel words☆36Updated 2 years ago
- Simple CORPORA list crawler☆10Updated 8 years ago
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated last year