dohliam / more-stoplistsLinks
stoplists for African languages generated from the ASP corpus
☆14Updated 9 years ago
Alternatives and similar repositories for more-stoplists
Users that are interested in more-stoplists are comparing it to the libraries listed below
Sorting:
- generate rules from lists of words☆16Updated 3 years ago
- An offline/online field database which adapts to its user's terminology and I-Language. http://fielddb.github.io☆79Updated 2 years ago
- This repository contains tool and collections dataset for detecting off-topic pages from Web archived collections.☆18Updated 9 years ago
- Collections of english historical texts and data relating to them☆18Updated 4 years ago
- List of (possible) English hedge words☆46Updated 2 years ago
- Examples of bad data, especially from government.☆23Updated 10 months ago
- sci.pe (science periodicals) extension of schema:ScholarlyArticle to describe the production process, content, distribution and preser…☆4Updated 2 years ago
- Formula to find the grade level according to the (revised) Dale–Chall Readability Formula (1995)☆31Updated 2 years ago
- An online reference for data journalism☆25Updated 11 years ago
- Tools for tracking stories on news homepages☆48Updated 5 years ago
- Basic dataset for the linguistic data collection.☆15Updated 8 years ago
- bigram / trigram analysis of wikipedia; mainly mutual info☆22Updated 13 years ago
- ☆12Updated 7 years ago
- Tools for working with Optical Character Recognition output☆16Updated 11 years ago
- automate incrementally producing word pronunciation recordings for Wiktionary through Wikimedia Commons☆22Updated 7 years ago
- Session notes, data, instructions and examples for a hands-on workshop on using a diverse set of tools and practices for journalistic dat…☆15Updated 8 years ago
- Web hub based on Wikidata☆37Updated 2 years ago
- Allow URLs to point to any text piece in a document☆16Updated 7 years ago
- Little list of happy places☆17Updated 4 years ago
- Data analysis pipelines☆11Updated 4 years ago
- Convert between DOM Range instances and text positions.☆25Updated 5 years ago
- download and process d3.js blocks for further indexing and visualization☆24Updated 6 years ago
- Formula to detect the ease of reading a text according to the Coleman-Liau index (1975)☆14Updated 2 years ago
- Framework for creating and accessing UBY resources – sense-linked lexical resources in standard UBY-LMF format☆22Updated 7 years ago
- List of easy American-English words: The New Dale-Chall (1995)☆32Updated 2 years ago
- 'Git for Tabular Data'☆46Updated 8 years ago
- a framework and language for exploring and analyzing feeds of social media data.☆23Updated 13 years ago
- The RICardo dataset compiles trade statistics sources of international trade bilateral flows of the 19th century.☆18Updated last week
- Navigating the sea of publications☆13Updated 9 years ago
- LevelGraph.io Playground☆11Updated 3 years ago