metawarc: a command-line tool for metadata extraction from files from WARC (Web ARChive)
☆35Oct 27, 2025Updated 4 months ago
Alternatives and similar repositories for metawarc
Users that are interested in metawarc are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Repository for the paper "Integrating Visual and Textual Inputs for Searching Large-Scale Map Collections with CLIP"☆12Oct 1, 2024Updated last year
- Searching millions of .gov PDFs☆32Updated this week
- DHQ is an open-access, peer-reviewed journal of digital humanities.☆17Updated this week
- DomainsProject.org HTTP worker☆25Dec 11, 2022Updated 3 years ago
- A collection of data fetchers, and simple quarterly and yearly CVE forecasting models.☆46Oct 1, 2025Updated 5 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Crawler that retrieves commoncrawl's crawled hosts and their corresponding IPs☆21Sep 1, 2025Updated 6 months ago
- A UserScript to detect GPT generated comments on Hackernews.☆13Dec 10, 2022Updated 3 years ago
- Simple, fast dictionary-based language detector for short texts.☆20Feb 5, 2026Updated last month
- Napkin is a simple tool to produce statistical analysis of a text☆12Feb 25, 2024Updated 2 years ago
- html,css☆12Sep 24, 2021Updated 4 years ago
- A study group for v4 of the fastai introduction to deep learning course with a focus on applications in GLAM settings☆15Oct 7, 2020Updated 5 years ago
- List of all pastebin.com analogs I know of. They are useful for finding leaked personal data☆22Jul 18, 2021Updated 4 years ago
- Computer-Aided Metadata Generation for Photoarchives Initiative☆18Nov 25, 2020Updated 5 years ago
- Google's list of Certificate Transparency logs as a rust crate for use with sct.rs☆14Feb 17, 2023Updated 3 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Web Archiving Course☆23Mar 4, 2024Updated 2 years ago
- the source code that powered gitlive.net☆11Feb 12, 2016Updated 10 years ago
- Passivedns monitor implementation in Rust.☆12Apr 21, 2016Updated 9 years ago
- A curated blocklist of Autonomous System Numbers (ASNs) associated with VPN providers, datacenters, and hosting services commonly used fo…☆15Mar 11, 2026Updated 2 weeks ago
- External twitter feeder for AIL framework☆16Apr 16, 2023Updated 2 years ago
- TikTok Scraper. Download video posts, collect user/trend/hashtag/music feed metadata, sign URL and etc.☆46Dec 19, 2021Updated 4 years ago
- s3 as a datastore: A way to use S3 as a key-value datastore instead of a real datastore. can be read as s3aadatastore☆14Mar 16, 2023Updated 3 years ago
- The Brandefense cyber threat intelligence team is always researching new threats and writing research reports. Our latest Threat Reports …☆23Oct 1, 2025Updated 5 months ago
- A tool for collection archival slivers of the web and web archives☆17Feb 18, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆24Mar 12, 2025Updated last year
- OSINT: come iniziare. Strumenti e idee per raccogliere e analizzare fonti aperte.☆11Mar 28, 2021Updated 4 years ago
- Capture a URL with Playwright☆30Updated this week
- Base45☆22Feb 20, 2026Updated last month
- IIIF experiments with Gallica content☆31Nov 16, 2025Updated 4 months ago
- A template for standard Maltego transformation☆13Dec 8, 2021Updated 4 years ago
- Network scan tool for host and service discovery. Written in Rust.☆22Feb 17, 2026Updated last month
- Sharable scripts and stylesheets from the Northeastern University Women Writers Project☆24Jan 22, 2026Updated 2 months ago
- CyCAT.org API back-end server including crawlers☆29Feb 4, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Quick Cache and Archive search buttons☆38May 11, 2024Updated last year
- GraphQLmap is a scripting engine to interact with a graphql endpoint for pentesting purposes. - Do not use for illegal testing ;)☆15Mar 11, 2024Updated 2 years ago
- Pythonic way to work with the warning lists defined there: https://github.com/MISP/misp-warninglists☆35Jan 8, 2026Updated 2 months ago
- ☆11Oct 1, 2025Updated 5 months ago
- Converts binary files of 1C (1CD, cf, epf, efd, etc.) to grepable CSV☆12Feb 12, 2024Updated 2 years ago
- A collection of cyberchef recipes for use in osint investigations☆14Jul 2, 2022Updated 3 years ago
- Tools and resources that may be useful to you when conducting investigations related to Islamic Republic of Iran☆21Sep 10, 2025Updated 6 months ago