metawarc: a command-line tool for metadata extraction from files from WARC (Web ARChive)
☆34Oct 27, 2025Updated 4 months ago
Alternatives and similar repositories for metawarc
Users that are interested in metawarc are comparing it to the libraries listed below
Sorting:
- Simple, fast dictionary-based language detector for short texts.☆20Feb 5, 2026Updated 3 weeks ago
- ☆11Jul 20, 2023Updated 2 years ago
- Napkin is a simple tool to produce statistical analysis of a text☆12Feb 25, 2024Updated 2 years ago
- A list of hashtags that bots automatically retweet. Use them to increase the reach of your tweets and increase the number of followers on…☆16Dec 13, 2021Updated 4 years ago
- Decentralized web archiving☆20Aug 7, 2018Updated 7 years ago
- A UserScript to detect GPT generated comments on Hackernews.☆13Dec 10, 2022Updated 3 years ago
- Crawler that retrieves commoncrawl's crawled hosts and their corresponding IPs☆21Sep 1, 2025Updated 6 months ago
- A collection of data fetchers, and simple quarterly and yearly CVE forecasting models.☆46Oct 1, 2025Updated 5 months ago
- html,css☆12Sep 24, 2021Updated 4 years ago
- The GeoCorpora project aims at creating corpora of fully geo-annotated texts (in particular microblog texts) and developing tools to supp…☆18Aug 12, 2024Updated last year
- List of all pastebin.com analogs I know of. They are useful for finding leaked personal data☆22Jul 18, 2021Updated 4 years ago
- Pollock is a benchmark for data loading on character-delimited files.☆25Apr 9, 2025Updated 10 months ago
- Base45☆22Feb 20, 2026Updated last week
- External twitter feeder for AIL framework☆16Apr 16, 2023Updated 2 years ago
- TikTok Scraper. Download video posts, collect user/trend/hashtag/music feed metadata, sign URL and etc.☆46Dec 19, 2021Updated 4 years ago
- CocktailParty is a data broker system based on phoenix framework☆23Apr 23, 2025Updated 10 months ago
- List of websites to search for court documents in different countries☆24Jun 1, 2022Updated 3 years ago
- DomainsProject.org HTTP worker☆25Dec 11, 2022Updated 3 years ago
- Capture a URL with Playwright☆30Feb 24, 2026Updated last week
- ☆24Mar 12, 2025Updated 11 months ago
- Extracts tables from .docx files and saves them as .csv or .xls files☆65Oct 11, 2023Updated 2 years ago
- A script to change authorship to ODT and DOCX comments, redlines and whatnot.☆34Feb 18, 2026Updated last week
- USB Scanning device☆32Sep 16, 2025Updated 5 months ago
- Pythonic way to work with the warning lists defined there: https://github.com/MISP/misp-warninglists☆35Jan 8, 2026Updated last month
- CyCAT.org API back-end server including crawlers☆29Feb 4, 2023Updated 3 years ago
- A Python implementation of our efficient Bloom filter library.☆29Feb 27, 2020Updated 6 years ago
- Lists of not-suitable-for-work words as YARA rules☆29Feb 2, 2026Updated last month
- Template for new OSINT command-line tools☆76Nov 25, 2024Updated last year
- D4 core software (server and sample sensor client)☆43Dec 23, 2023Updated 2 years ago
- AIL project training materials☆39Feb 24, 2026Updated last week
- ☆17Feb 20, 2026Updated last week
- Incident Notification Platform by @NC3-LU☆11Updated this week
- A curated collection of tools, bots, and resources for Open Source Intelligence (OSINT) investigations on Telegram. Includes chat analysi…☆56Oct 5, 2025Updated 4 months ago
- ☆11Oct 1, 2025Updated 5 months ago
- 🐔 A JavaScript library that provides a score for the likelihood of a user using a headless browser.☆37Jan 13, 2021Updated 5 years ago
- A set of YARA rules for the AIL framework to detect leak or information disclosure☆41Jan 31, 2025Updated last year
- Fast lookup server for NSRL and other hash database used in digital forensic☆48Jan 26, 2026Updated last month
- CIRCL system forensic tools or a jumble of tools to support forensic☆41Jan 20, 2023Updated 3 years ago
- A tool used to do reverse video searches.☆10Nov 30, 2024Updated last year