karust / gogetcrawl
Extract web archive data using Wayback Machine and Common Crawl
☆155Updated 5 months ago
Alternatives and similar repositories for gogetcrawl:
Users that are interested in gogetcrawl are comparing it to the libraries listed below
- Common crawl extractor☆75Updated 11 months ago
- Easy to deploy API for transcribing and translating audio / video using OpenAI's whisper model.☆66Updated last year
- Curated list of categorized User Agents☆87Updated this week
- Scraping and listing text and image searches on Google, Bing, DuckDuckGo, Baidu, Yahoo japan.☆79Updated 11 months ago
- Community curated list of search queries for various products across multiple search engines.☆174Updated last week
- Yet another googlesearch - A Python library for executing intelligent, realistic-looking, and tunable Google searches.☆277Updated last year
- go-trafilatura is a Go port of the trafilatura Python library.☆59Updated 5 months ago
- DomainsProject.org HTTP worker☆22Updated 2 years ago
- This program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.☆166Updated this week
- A UserScript to detect GPT generated comments on Hackernews.☆13Updated 2 years ago
- CLI utility to scrape emails from websites☆160Updated last year
- The Architecture of a Web Crawler: Building a Google-Inspired Distributed Web Crawler☆114Updated 4 months ago
- Given a subreddit name and a keyword, this program returns all top (by default) posts that contain the specified keyword.☆90Updated last year
- 📊 Adana - 1-click analytical dashboard for OSINT researchers☆40Updated 11 months ago
- Search for documents in a domain through Search Engines (Google, Bing and Baidu). The objective is to extract metadata☆215Updated last year
- A definitive guide to generating usernames for OSINT purposes☆163Updated 10 months ago
- Run a base query (plus optional add-ons) through ask, bing, brave, duck duck go, yahoo, and yandex.☆22Updated 2 years ago
- A tool for searching common variations of a human name☆45Updated 6 months ago
- Command-line tool for clustering geolocations 📍☆40Updated 3 months ago
- Retrieves archived tweets from Wayback Machine in HTML, CSV, and JSON☆102Updated last month
- Bruter is an OSINT tooling, an experiment to build a reconnaissance simple app to have fun 🕵️♂️☆54Updated 7 months ago
- Maltego transforms for investigative journalism☆81Updated last year
- AIx is a cli tool to interact with Large Language Models (LLM) APIs.☆283Updated this week
- A guide to LLM hacking: fundamentals, prompt injection, offense, and defense☆148Updated 2 years ago
- Drill into WARC web archives☆137Updated 6 months ago
- Scraper for Odysee: alt-tech platform for sharing video☆17Updated last year
- Golang Crawling and scraping framework☆111Updated 2 months ago
- An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a Tweets and more whil…☆185Updated last year
- A fast GitHub stargazers information gathering tool☆73Updated 3 years ago
- A selection of useful Custom Serch Engines for OSINT.☆63Updated 3 months ago