CitizensFoundation / pace-keyword-scannerLinks
CommonCrawl keyword scanner. Time for month of CC data on EC2 c5.18xlarge instance for hundreds of keywords takes about 3 hours. LLM (BERT) based 2nd level filtering. Developed with support from the EU and the Populism & Civic Engagement H2020 project.
☆15Updated 2 years ago
Alternatives and similar repositories for pace-keyword-scanner
Users that are interested in pace-keyword-scanner are comparing it to the libraries listed below
Sorting:
- Track changes to GraphQL APIs by git scraping their schemas☆28Updated last month
- A Google Trends Analytics Package☆13Updated last year
- Vector Embedding Markup Language - markup language designed specifically for annotating and structuring data related to vector embeddings…☆12Updated last year
- Scrape various open data directories to create an index of what's available out there☆37Updated 3 months ago
- Dead simple cron service for making HTTP calls on a regular schedule.☆14Updated 4 years ago
- A list of awesome browser extensions to help ith SEO and rank higher!☆23Updated 4 years ago
- Datasette showing global power plant data from https://github.com/wri/global-power-plant-database☆17Updated last week
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 6 years ago
- Helps you to visualize the site structure☆9Updated last year
- Repository to allow collaboration between Cycle Labs Cloud community in support of the community.☆9Updated 3 years ago
- Datasette enrichment for analyzing row data using OpenAI's GPT models☆19Updated last year
- ☆12Updated 2 months ago
- The little things give you away... A collection of various small helper stuff – Mirror repo only, no longer kept in sync, refer to gitea.…☆24Updated 4 years ago
- how hard is it to get a list of all local news sites in the United States (LOL)☆8Updated 5 years ago
- Scrape and parse Google search results in Node.JS☆32Updated 2 years ago
- A Flat Data GitHub Action demo repo☆35Updated this week
- Use GPTparser with your OpenAI API to scrape & parse files into structured JSON files.☆14Updated last year
- ☆12Updated last year
- Data API and micro orm for DuckDB and MotherDuck☆8Updated 5 months ago
- ☆14Updated 3 years ago
- Email Enricher is a free, offline alternative to Clearbit for enriching emails. Determine if an email likely belongs to a Fortune 1000 co…☆18Updated last year
- DocumentCloud's front end source code - Please report bugs, issues and feature requests to info@documentcloud.org☆21Updated this week
- Open Access PDF harvester☆40Updated last year
- Hacker News Search and RAG built using Rust actix-web, minijinja, SolidJS, Vite, and Redis queue's☆27Updated 5 months ago
- Fully customizable open source voice experience that can be hosted on any website.☆33Updated 3 years ago
- A minimal client-side library to convert your vanilla URLs to deep links.☆20Updated 4 years ago
- GNewsScraper is a TypeScript package that scrapes article data from Google News based on a keyword or phrase. It returns the results as a…☆12Updated last year
- An introduction to free, automated web scraping with GitHub’s powerful new Actions framework.☆28Updated 9 months ago
- A visualisation library for beneficial ownership structures☆21Updated last month
- DocumentCloud's back end source code - Please report bugs, issues and feature requests to info@documentcloud.org☆39Updated 2 weeks ago