CitizensFoundation / pace-keyword-scannerLinks
CommonCrawl keyword scanner. Time for month of CC data on EC2 c5.18xlarge instance for hundreds of keywords takes about 3 hours. LLM (BERT) based 2nd level filtering. Developed with support from the EU and the Populism & Civic Engagement H2020 project.
☆15Updated 2 years ago
Alternatives and similar repositories for pace-keyword-scanner
Users that are interested in pace-keyword-scanner are comparing it to the libraries listed below
Sorting:
- API for OpenSanctions with support for entity search and bulk matching of data collections. Supports Reconciliation API spec.☆108Updated this week
- Track changes to GraphQL APIs by git scraping their schemas☆30Updated 6 months ago
- Ontology dataset for open_numbers namespace☆10Updated 11 months ago
- The little things give you away... A collection of various small helper stuff – Mirror repo only, no longer kept in sync, refer to gitea.…☆24Updated 5 years ago
- Real-Time Proxy & Web Scraping API☆24Updated 6 years ago
- Fully customizable open source voice experience that can be hosted on any website.☆33Updated 3 years ago
- keywords-extract - Command line tool extract keywords from any web page.☆61Updated 7 years ago
- Matrix-based News Aggregation to Explore Media Bias☆19Updated 7 years ago
- An open-source archive that gathers, saves, shares and analyzes news homepages☆146Updated 3 weeks ago
- Add website scraping abilities to Datasette☆64Updated 2 years ago
- A helper library full of URL-related heuristics.☆73Updated last month
- Create a static website with Fly - HTML from the example☆21Updated last year
- Twitter stream + search API grabber☆105Updated 2 years ago
- all that favours real-time democracy☆15Updated 3 years ago
- Easily build and maintain any kind of contract. Free and Open Source☆96Updated 8 years ago
- ☆14Updated 3 years ago
- A curated list of awesome resources on crowdsourcing, human computation, and online behavioral experiments.☆49Updated 7 years ago
- Real-time insights into the news you read☆28Updated 2 years ago
- Datasette enrichment for analyzing row data using OpenAI's GPT models☆21Updated last year
- RSS Reader API written in Django Rest☆45Updated last year
- Datasette plugin for rendering HTML based on JSON values☆28Updated 3 years ago
- Penme is a lightweight open source note taking app focused on privacy!☆26Updated 5 years ago
- Grammarify is a npm package that safely cleans up text that has mispellings, improper capitalization, lexical illusions, among other thin…☆73Updated 2 years ago
- Code and data belonging to our CSCW 2019 paper: "Dark Patterns at Scale: Findings from a Crawl of 11K Shopping Websites".☆133Updated 6 years ago
- Datasette plugin for uploading CSV files and converting them to database tables☆27Updated last year
- Extract networks of entities from journalistic reporting☆48Updated 2 years ago
- The Misinformation Game is a social-media simulator built to study how people interact with information on social-media.☆31Updated 3 months ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆58Updated last year
- Open Source Captcha API☆49Updated last year
- A Command line interface that allows you to manage the back end of your self hosted typesense server. Builds on top of the typesense js l…☆16Updated 2 years ago