CitizensFoundation / pace-keyword-scannerLinks
CommonCrawl keyword scanner. Time for month of CC data on EC2 c5.18xlarge instance for hundreds of keywords takes about 3 hours. LLM (BERT) based 2nd level filtering. Developed with support from the EU and the Populism & Civic Engagement H2020 project.
☆15Updated 2 years ago
Alternatives and similar repositories for pace-keyword-scanner
Users that are interested in pace-keyword-scanner are comparing it to the libraries listed below
Sorting:
- Ontology dataset for open_numbers namespace☆10Updated 11 months ago
- Real-Time Proxy & Web Scraping API☆24Updated 6 years ago
- Track changes to GraphQL APIs by git scraping their schemas☆30Updated 6 months ago
- A curated list of promising Web Data Extractors resources☆29Updated 5 years ago
- API for OpenSanctions with support for entity search and bulk matching of data collections. Supports Reconciliation API spec.☆105Updated last week
- The little things give you away... A collection of various small helper stuff – Mirror repo only, no longer kept in sync, refer to gitea.…☆24Updated 5 years ago
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 7 years ago
- A minimal client-side library to convert your vanilla URLs to deep links.☆21Updated 4 years ago
- API client for Aleph, supports bulk entity and document upload.☆28Updated 11 months ago
- An open-source archive that gathers, saves, shares and analyzes news homepages☆145Updated last week
- ☆30Updated 11 years ago
- A list of awesome browser extensions to help ith SEO and rank higher!☆23Updated 4 years ago
- A Command line interface that allows you to manage the back end of your self hosted typesense server. Builds on top of the typesense js l…☆16Updated 2 years ago
- Dataset files for the Open Data on GitHub paper☆30Updated 7 months ago
- A helper library full of URL-related heuristics.☆70Updated 2 weeks ago
- keywords-extract - Command line tool extract keywords from any web page.☆61Updated 6 years ago
- Best tool for your startup task, A toolchain for your entire startup.☆24Updated 2 years ago
- Twitter stream + search API grabber☆105Updated 2 years ago
- Frontend interface for Datashare, a self-hosted search engine for documents.☆38Updated this week
- Easily build and maintain any kind of contract. Free and Open Source☆96Updated 7 years ago
- Datasette plugin for uploading CSV files and converting them to database tables☆27Updated last year
- Datasette enrichment for analyzing row data using OpenAI's GPT models☆21Updated last year
- Create blockchain-ready document workflows, own your data. NOTICE: Looking for maintainer.☆19Updated last week
- LTI app for integrating with learning management systems☆49Updated last week
- ☆16Updated this week
- Public API client for GETTR, a "non-bias [sic] social network," designed for data archival and analysis.☆94Updated 3 months ago
- Use GPTparser with your OpenAI API to scrape & parse files into structured JSON files.☆14Updated last year
- Extract networks of entities from journalistic reporting☆48Updated 2 years ago
- Scrape HN to track links from specific domains☆63Updated this week
- Mecodify tool for twitter data analysis and visualisation☆42Updated 2 years ago