tbrianjones/website_extractor

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tbrianjones/website_extractor)

tbrianjones / website_extractor

This is an In-Memory Web Crawler & Scraper built to extract data in small runs from public websites. The current implementation takes a .csv of urls and crawls the sites, extracting basic info about the site like emails, phone numbers, addresses, & specified terms.

☆11

Alternatives and similar repositories for website_extractor

Users that are interested in website_extractor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

paulonteri / shule-s-frontend
View on GitHub
School management, E-learning, plus Powerful Data & Communication Tools For Modern Schools. https://shulesuite.com
☆12Jun 13, 2021Updated 5 years ago
freeCodeCamp / boilerplate-SHA-1-password-cracker
View on GitHub
☆16Feb 14, 2024Updated 2 years ago
geoffreynyaga / daraja
View on GitHub
Daraja API tutorial using django/python in backend and React Native on frontend
☆25Dec 8, 2022Updated 3 years ago
smowtion / urlchecker
View on GitHub
Urlchecker.org API Document ( Check 1000 files hosts,ziplink,encoded links and ads link. example: rapidgator,mefiafire,mega.nz,adf.ly and…
☆19Oct 10, 2017Updated 8 years ago
rudijs / amqp.node-rpc-factory
View on GitHub
Node.js AMQP RPC Consumer and Publisher Factory
☆11Feb 14, 2022Updated 4 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
acheamponge / HTGAWM
View on GitHub
How To Get Away With Murder - An NLP Project to analyze the textual data of 15 prominent Court cases involved in the Black Lives Matter M…
☆10Jun 14, 2020Updated 6 years ago
dial-once / node-rules-engine
View on GitHub
A node.js module to check if an event array matches some specifications.
☆13Nov 4, 2016Updated 9 years ago
asciimoo / autodep
View on GitHub
Install python dependencies automatically at runtime
☆13Feb 16, 2016Updated 10 years ago
anroots / sensu-stack
View on GitHub
Sensu monitoring stack running in Docker on Docker Cloud
☆12Oct 25, 2016Updated 9 years ago
gregberge / node-taskman
View on GitHub
Fast work queue based on redis.
☆28Jun 6, 2019Updated 7 years ago
dial-once / node-mongoat
View on GitHub
MongoDB lightweight wrapper adding hooks (pre/post), auto createdAt/updatedAt, in a native MongoDB experience
☆14Mar 28, 2017Updated 9 years ago
n4xh4ck5 / Th4sD0m
View on GitHub
Tool to identify all domains contained in an IP anonymously
☆15Jun 4, 2017Updated 9 years ago
TheOpenSpaceProgramArchive / urho-osp
View on GitHub
Open Space Program in Urho3D
☆11Jun 15, 2020Updated 6 years ago
Klaus9090 / CarTube
View on GitHub
YouTube for Android Auto without ROOT (rootless)
☆16Mar 28, 2021Updated 5 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
AnLoMinus / TeleHack
View on GitHub
TeleHack - Telegram Tools, Packs, Bots, Info, Administration, Scrapers, Adders, Groups & Channels, and more..
☆32Apr 30, 2022Updated 4 years ago
matmill5 / ken-batcher-pp-ocr
View on GitHub
Optical character recognition (OCR) project to catalog the work of PP-father - Kenneth E. Batcher
☆14Jan 12, 2020Updated 6 years ago
OndiekOchieng / Teksade-The-Tech-Community-HQ
View on GitHub
Teksade is an open-source platform connecting developers with tech communities, built as a learning initiative for open-source contributi…
☆16Feb 26, 2024Updated 2 years ago
jsphpl / imap-email-address-collector
View on GitHub
A python script to extract contact names and email addresses from all messages on an IMAP server.
☆15Nov 12, 2015Updated 10 years ago
skamsie / Domain-Status-Checker
View on GitHub
Gets ip, http return code and domain name registrar of domains
☆13Jan 1, 2018Updated 8 years ago
aliarslan10 / Laravel-Eticaret
View on GitHub
Paytr API ile sanal pos ödemesi olan Laravel E-Ticaret Sistemi
☆12May 31, 2017Updated 9 years ago
RickStrahl / Westwind.AI
View on GitHub
☆11May 16, 2026Updated 2 months ago
ZiMADE / EmoKill
View on GitHub
EmoKill is an Emotet process detection and killing tool for Windows OS. It avoids wasting time after detection of Emotet. Any process t…
☆13Dec 8, 2022Updated 3 years ago
bleachbit / docs
View on GitHub
BleachBit documentation
☆12Jun 22, 2026Updated last month
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
aaronhoffman / WebsiteContactHarvester
View on GitHub
Crawl websites for contact information. Extract email, phone, facebook, twitter.
☆18Oct 26, 2020Updated 5 years ago
FrancisFaure / vfp_tmlanguage_generator
View on GitHub
VFP: Generate an extension for VS Code which provides support for the Visual FoxPro language
☆12Sep 17, 2017Updated 8 years ago
ashraf-minhaj / TotLa-a-Talking-Robot-in-Python
View on GitHub
A fully functional Talking robot which can Listen, Decide and talk back.
☆13Sep 30, 2020Updated 5 years ago
paulonteri / django-serverless-cron
View on GitHub
django-serverless-cron 🦡 A Django library with a simpler approach running cron jobs in a serverless environment through HTTP requests. T…
☆54Nov 7, 2022Updated 3 years ago
c-x / nmap-webshot
View on GitHub
nmap nse script for web services screenshot
☆16Jul 22, 2013Updated 13 years ago
calvinmetcalf / parseDBF
View on GitHub
☆12Oct 1, 2025Updated 9 months ago
malwares / Ebowla
View on GitHub
Framework for Making Environmental Keyed Payloads
☆13Nov 1, 2016Updated 9 years ago
zayedaljaberi / urlfuzzing
View on GitHub
Advance URL Fuzzing + Whois Domain running on python
☆19Nov 8, 2022Updated 3 years ago
azraelkuan / tensorflow_wavenet_vocoder
View on GitHub
wavenet vocoder using tensorflow
☆26Feb 18, 2018Updated 8 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
oklyc / MetalCameraSample-master-2
View on GitHub
☆13Jul 21, 2015Updated 11 years ago
ScriptTiger / Microsoft-Updates
View on GitHub
If you just got a fresh Windows 10 and you don't know why your Internet stopped working around the same time, try this. These are scripts…
☆11Jan 15, 2025Updated last year
LV-Crew / Multiboot-USB-Stick
View on GitHub
A multi-page Guide to create a Multiboot-USB-Stick for IT-Technicians that use Boot CDs on a daily basis.
☆16Jun 11, 2018Updated 8 years ago
ScriptTiger / Hosts-Conversions
View on GitHub
Drag and drop a hosts file to convert it.
☆14Jan 15, 2025Updated last year
obdresource / OBDResource-Diagnostic-tool-Shop
View on GitHub
BDResource Technology Co.,Ltd, engaged in making auto electrical diagnostic tools, such as X431,GM tech2,TMS374,Star 2000 Diagnostic Syst…
☆16Oct 18, 2012Updated 13 years ago
synt4x93 / android_device_samsung_starlte
View on GitHub
☆10Sep 17, 2021Updated 4 years ago
magicdude4eva / mailwizz-nginx-seo
View on GitHub
MailWizz NGINX example with search-engine friendly URLs and hardened security
☆18Jul 26, 2025Updated 11 months ago