CitizensFoundation / pace-keyword-scanner
CommonCrawl keyword scanner. Time for month of CC data on EC2 c5.18xlarge instance for hundreds of keywords takes about 3 hours. LLM (BERT) based 2nd level filtering. Developed with support from the EU and the Populism & Civic Engagement H2020 project.
☆13Updated last year
Alternatives and similar repositories for pace-keyword-scanner:
Users that are interested in pace-keyword-scanner are comparing it to the libraries listed below
- A collaborative collection of datasets that are common to use within "Follow the Money" investigations with european scope☆13Updated 7 months ago
- Track changes to GraphQL APIs by git scraping their schemas☆26Updated this week
- Datasette showing global power plant data from https://github.com/wri/global-power-plant-database☆17Updated 2 years ago
- ☆14Updated 2 years ago
- A Google Trends Analytics Package☆13Updated 7 months ago
- PesaYetu, an easy-to-use visualization tool that helps journalists quickly find, analyse and compare government budget data to help fact-…☆11Updated 7 months ago
- ☆26Updated 4 years ago
- Scrape various open data directories to create an index of what's available out there☆34Updated this week
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆14Updated last year
- A demonstration transnational register of beneficial ownership data from the UK, Denmark, Slovakia and Armenia☆17Updated 2 months ago
- DocumentCloud's back end source code - Please report bugs, issues and feature requests to info@documentcloud.org☆35Updated this week
- Everyting you need to know about Aquila Network Neural Search Ecosystem. Official repositories, client libraries, ecosystem projects, boi…☆33Updated 3 years ago
- ☆15Updated this week
- Materials to reproduce findings in our story, "Google’s Top Search Result? Surprise! It’s Google"☆34Updated 4 years ago
- A Command line interface that allows you to manage the back end of your self hosted typesense server. Builds on top of the typesense js l…☆16Updated last year
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆25Updated 5 months ago
- A Fediverse robot account that posts the latest public records requests filed and completed at muckrock.com☆14Updated this week
- An introduction to free, automated web scraping with GitHub’s powerful new Actions framework.☆29Updated 4 months ago
- Datasette plugin for rendering Markdown☆26Updated last year
- Common Paper Service Level Agreement☆13Updated 9 months ago
- Quit Datasette if it has not received traffic for a specified time period☆16Updated 10 months ago
- Rig for deploying DocumentCloud viewers to S3.☆13Updated 3 years ago
- 🗳️ Monitor your country, your city council or your organization promises and objectives☆13Updated 3 years ago
- A visualisation library for beneficial ownership structures☆21Updated 3 weeks ago
- 📚 Online archive for annual reports of the German internal intelligence☆11Updated 2 months ago
- Datasette plugin for uploading CSV files and converting them to database tables☆25Updated 9 months ago
- Datasette plugin for publishing data using Vercel☆44Updated 2 years ago
- List of privacy-friendly analytics solutions☆18Updated last year
- Datasette plugin for rendering HTML based on JSON values☆26Updated 2 years ago
- web app for visualizing Wikidata items on a timeline☆14Updated 5 years ago