CitizensFoundation / pace-keyword-scanner
CommonCrawl keyword scanner. Time for month of CC data on EC2 c5.18xlarge instance for hundreds of keywords takes about 3 hours. LLM (BERT) based 2nd level filtering. Developed with support from the EU and the Populism & Civic Engagement H2020 project.
☆14Updated 2 years ago
Alternatives and similar repositories for pace-keyword-scanner:
Users that are interested in pace-keyword-scanner are comparing it to the libraries listed below
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 6 years ago
- Track changes to GraphQL APIs by git scraping their schemas☆28Updated 2 weeks ago
- Datasette enrichment for analyzing row data using OpenAI's GPT models☆19Updated 11 months ago
- how hard is it to get a list of all local news sites in the United States (LOL)☆8Updated 4 years ago
- A demonstration transnational register of beneficial ownership data from the UK, Denmark, Slovakia and Armenia☆17Updated 5 months ago
- H2O is a web app for creating and reading open educational resources, primarily in the legal field☆38Updated 2 months ago
- Datasette plugin providing a UI for executing SQL writes against the database☆10Updated 7 months ago
- A visualisation library for beneficial ownership structures☆21Updated 2 months ago
- Dead simple cron service for making HTTP calls on a regular schedule.☆14Updated 4 years ago
- A Google Trends Analytics Package☆13Updated 10 months ago
- A minimal client-side library to convert your vanilla URLs to deep links.☆20Updated 4 years ago
- Datasette plugin for uploading CSV files and converting them to database tables☆26Updated last year
- The Misinformation Game is a social-media simulator built to study how people interact with information on social-media.☆28Updated last month
- API for OpenSanctions with support for entity search and bulk matching of data collections. Supports Reconciliation API spec.☆81Updated this week
- Everyting you need to know about Aquila Network Neural Search Ecosystem. Official repositories, client libraries, ecosystem projects, boi…☆32Updated 3 years ago
- A Flat Data GitHub Action demo repo☆35Updated 3 weeks ago
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆21Updated last year
- Matomo plugin for Docusaurus v2/v3☆14Updated last year
- Data Catalog Specification (Schema and Protocol)☆21Updated 6 years ago
- Open Access PDF harvester☆40Updated 11 months ago
- A Command line interface that allows you to manage the back end of your self hosted typesense server. Builds on top of the typesense js l…☆16Updated last year
- Datasette showing global power plant data from https://github.com/wri/global-power-plant-database☆17Updated 2 years ago
- ☆16Updated last week
- ☆12Updated last year
- A collaborative collection of datasets that are common to use within "Follow the Money" investigations with european scope☆13Updated 10 months ago
- Fully customizable open source voice experience that can be hosted on any website.☆33Updated 2 years ago
- LLM Oracle is a GPT-4 powered tool for predicting future events. It's like a Magic 8 Ball that is able to perform basic research, calcula…☆17Updated last year
- List of publicly available, free/open source and open access resources for learning and doing data journalism.☆45Updated last year
- DocumentCloud's back end source code - Please report bugs, issues and feature requests to info@documentcloud.org☆38Updated last week
- Datasets used for articles and stories made available on Pointer (www.pointer.nl)☆10Updated 5 years ago