dohliam / more-stoplists
stoplists for African languages generated from the ASP corpus
☆14Updated 8 years ago
Related projects ⓘ
Alternatives and complementary repositories for more-stoplists
- List of (possible) English hedge words☆44Updated 2 years ago
- bigram / trigram analysis of wikipedia; mainly mutual info☆22Updated 12 years ago
- This repository contains tool and collections dataset for detecting off-topic pages from Web archived collections.☆18Updated 9 years ago
- An offline/online field database which adapts to its user's terminology and I-Language. http://fielddb.github.io☆79Updated last year
- My thesis on "Open Source Code and Low Resource Languages" for an MSc in Language Science and Technology at Saarland University☆20Updated 6 years ago
- JSON datasets for powering 'intelligent' spell and grammar checkers☆33Updated 13 years ago
- Examples of bad data, especially from government.☆22Updated 3 months ago
- Basic dataset for the linguistic data collection.☆15Updated 7 years ago
- Web hub based on Wikidata☆36Updated last year
- Discover, analyze and present data from the web and mobile in meaninful ways☆83Updated 11 years ago
- Auto-generated trivia questions based on DBPedia data.☆15Updated 7 years ago
- Tools for working with Optical Character Recognition output☆16Updated 10 years ago
- sci.pe (science periodicals) extension of schema:ScholarlyArticle to describe the production process, content, distribution and preser…☆4Updated last year
- DBpedia, which frequently crawls and analyses over 120 Wikipedia language editions has near complete information about (1) which facts ar…☆10Updated last year
- A compile-to-JSON data pipeline scripting language [ DEPRECATED - More info on active projects and modules at https://dat-ecosystem.org/ …☆43Updated 2 years ago
- Resolve data table conflicts☆17Updated 9 years ago
- CLI utility to spider websites and extract links to data files☆13Updated 9 years ago
- Navigating the sea of publications☆13Updated 8 years ago
- An online reference for data journalism☆25Updated 10 years ago
- A browser extension that utilizes sentiment analysis to find and highlight constructive comments on various social media platforms that o…☆39Updated 5 years ago
- An npm package that allows easy entity searching of Wikidata.☆10Updated 7 years ago
- Lexicons for n-gram sentiment analysis☆20Updated 9 years ago
- Code for recon16 hack day☆16Updated 6 years ago
- download and process d3.js blocks for further indexing and visualization☆24Updated 5 years ago
- NYT Risk Semantics Project☆12Updated 8 years ago
- A simple transformation/data processing pipeline for CrisisNET☆15Updated 10 years ago
- CLI tool for importing entities from Wikidata / Wikibase☆23Updated 2 years ago
- Session notes, data, instructions and examples for a hands-on workshop on using a diverse set of tools and practices for journalistic dat…☆15Updated 8 years ago
- bilingual dictionary extractor from parallel corpora☆22Updated 10 years ago