karust / gogetcrawlLinks

Extract web archive data using Wayback Machine and Common Crawl

☆171

Alternatives and similar repositories for gogetcrawl

Users that are interested in gogetcrawl are comparing it to the libraries listed below

Sorting:

hynky1999 / CmonCrawl
Common crawl extractor
☆84Updated last year
bellingcat / whisperbox-transcribe
Easy to deploy API for transcribing and translating audio / video using OpenAI's whisper model.
☆68Updated last year
chryzsh / GPTCommentDetector
A UserScript to detect GPT generated comments on Hackernews.
☆13Updated 3 years ago
opsdisk / yagooglesearch
Yet another googlesearch - A Python library for executing intelligent, realistic-looking, and tunable Google searches.
☆290Updated last year
cvcio / rtaa-72
RTAA-72, is CVCIO's real-time intelligence dashboard for Twitter
☆20Updated 3 years ago
tb0hdan / idun
DomainsProject.org HTTP worker
☆25Updated 3 years ago
claromes / waybacktweets
Archived tweets from the Wayback Machine
☆166Updated 8 months ago
crissyfield / troll-a
Drill into WARC web archives
☆141Updated last year
davemolk / searcher
Run a base query (plus optional add-ons) through ask, bing, brave, duck duck go, yahoo, and yandex.
☆25Updated 2 years ago
commoncrawl / cc-webgraph
Tools to construct and process Common Crawl webgraphs
☆105Updated last week
projectdiscovery / useragent
Curated list of categorized User Agents
☆110Updated 2 weeks ago
Volifter / Belligcat-Hackathon
☆20Updated last month
tanaikech / goris
This is a CLI tool to search for images with Google Reverse Image Search (goris).
☆122Updated 7 months ago
commoncrawl / cc-crawl-statistics
Statistics of Common Crawl monthly archives mined from URL index files
☆208Updated last week
tb0hdan / freya
DomainsProject.org DNS worker
☆26Updated last year
bellingcat / name-variant-search
A tool for searching common variations of a human name
☆49Updated last month
dwisiswant0 / stargather
A fast GitHub stargazers information gathering tool
☆72Updated 3 years ago
s0rg / crawley
The unix-way web crawler
☆329Updated 3 weeks ago
sshh12 / llm_osint
LLM OSINT is a proof-of-concept method of using LLMs to gather information from the internet and then perform a task with this informatio…
☆257Updated last year
jois-code / tweeds
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a Tweets and more whil…
☆187Updated 2 years ago
bellingcat / sugartrail
Visualise networks of companies, officers and addresses connected through UK Companies House
☆71Updated 3 months ago
XORbit01 / webpalm
🕸️ Crawl in the web network
☆380Updated 10 months ago
castrickclues / Nikelligence
Look up an email address or a name on Nike Run Club (NRC)
☆15Updated last year
claromes / telegramtrac
Browser interface to Telegram's API with additional modules for generating datasets and network graphs
☆13Updated 2 years ago
crissyfield / crux-dumps
📝 This repository contains dumps of the monthly "Chrome UX Report" (CrUX) datasets.
☆45Updated 3 weeks ago
soxoj / username-generation-guide
A definitive guide to generating usernames for OSINT purposes
☆166Updated last year
sw33tLie / impressive-chatgpt
A collection of impressive and useful results from OpenAI's chatgpt
☆76Updated 3 years ago
DedSecInside / gotor
This program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.
☆168Updated 3 months ago
davemolk / goGetJS
a tool for extracting, searching, and saving JavaScript files (with optional headless browser)
☆38Updated 3 years ago
projectdiscovery / aix
AIx is a cli tool to interact with Large Language Models (LLM) APIs.
☆311Updated last week