CitizensFoundation / pace-keyword-scanner
CommonCrawl keyword scanner. Time for month of CC data on EC2 c5.18xlarge instance for hundreds of keywords takes about 3 hours. LLM (BERT) based 2nd level filtering. Developed with support from the EU and the Populism & Civic Engagement H2020 project.
☆15Updated 2 years ago
Alternatives and similar repositories for pace-keyword-scanner
Users that are interested in pace-keyword-scanner are comparing it to the libraries listed below
Sorting:
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 6 years ago
- Track changes to GraphQL APIs by git scraping their schemas☆28Updated last month
- A Google Trends Analytics Package☆13Updated 11 months ago
- how hard is it to get a list of all local news sites in the United States (LOL)☆8Updated 5 years ago
- A list of awesome browser extensions to help ith SEO and rank higher!☆23Updated 4 years ago
- The little things give you away... A collection of various small helper stuff – Mirror repo only, no longer kept in sync, refer to gitea.…☆24Updated 4 years ago
- Datasette enrichment for analyzing row data using OpenAI's GPT models☆19Updated last year
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆23Updated last year
- H2O is a web app for creating and reading open educational resources, primarily in the legal field☆38Updated 3 months ago
- A demonstration transnational register of beneficial ownership data from the UK, Denmark, Slovakia and Armenia☆17Updated 6 months ago
- An introduction to free, automated web scraping with GitHub’s powerful new Actions framework.☆28Updated 8 months ago
- ☆12Updated last year
- Datasette showing global power plant data from https://github.com/wri/global-power-plant-database☆17Updated 3 years ago
- The Toolkit API, app, and browser extension. Start preserving now.☆47Updated 5 months ago
- A Fediverse robot account that posts the latest public records requests filed and completed at muckrock.com☆14Updated this week
- A case management app built with Lowdefy.☆32Updated last year
- Datami's mirror repo (source on Gitlab)☆34Updated 3 weeks ago
- 🗳️ Monitor your country, your city council or your organization promises and objectives☆14Updated 3 years ago
- Helps you to visualize the site structure☆9Updated last year
- API for OpenSanctions with support for entity search and bulk matching of data collections. Supports Reconciliation API spec.☆81Updated this week
- Express.js app that runs Puppeteer as a service; visits specified URL with Chromium and sends back various data (requests, cookies, etc.)…☆12Updated 2 years ago
- A collaborative collection of datasets that are common to use within "Follow the Money" investigations with european scope☆13Updated 11 months ago
- Datasette plugin for uploading CSV files and converting them to database tables☆26Updated last year
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆26Updated 9 months ago
- ☆27Updated 4 years ago
- Dockerized workflow automation tool☆20Updated this week
- DocumentCloud's front end source code - Please report bugs, issues and feature requests to info@documentcloud.org☆21Updated this week
- Matomo plugin for Docusaurus v2/v3☆14Updated last year
- Metadata and per-statute PDFs for the U.S. Statutes at Large through volume 64 (1789-1951).☆17Updated 5 years ago
- A Flat Data GitHub Action demo repo☆35Updated 2 weeks ago