tballison/commoncrawl-fetcher-lite

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tballison/commoncrawl-fetcher-lite)

tballison / commoncrawl-fetcher-lite

Simplified version of a common crawl fetcher

☆16

Alternatives and similar repositories for commoncrawl-fetcher-lite

Users that are interested in commoncrawl-fetcher-lite are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tballison / lucene-addons
View on GitHub
Standalone versions of LUCENE_5205 and other patches: SpanQueryParser, Concordance and Co-occurrence stats
☆18Aug 2, 2021Updated 4 years ago
mattflax / dropwizard-tika-server
View on GitHub
A DropWizard wrapper around Apache Tika.
☆10Dec 22, 2016Updated 9 years ago
tballison / file-observatory
View on GitHub
Single server/laptop grade file-observatory
☆10Mar 30, 2023Updated 3 years ago
nccgroup / mimikatz-detector-busylight
View on GitHub
USB HID driver emulation with PID/VID (0x3bca/0x27bb) of Plenom A/S Busylight Alpha, that is supported by Mimikatz. When mimikatz is exec…
☆21Sep 6, 2022Updated 3 years ago
dmarx / the-rest-of-the-fucking-owl
View on GitHub
Trigger an LLM in your CI/CD to auto-complete your work
☆11Apr 5, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
priamai / sigmatau
View on GitHub
An extension of the sigma standard to include security metrics.
☆17May 18, 2023Updated 3 years ago
file / file-tests
View on GitHub
File-tests is test-suite for File tool. Previous home: https://fedorahosted.org/file-tests/
☆21Jun 3, 2026Updated last month
agilebirds / openflexo
View on GitHub
Software in this repository is not maintained anymore
☆11Jul 6, 2022Updated 4 years ago
mitre / rhapsode
View on GitHub
Advanced desktop search/corpus exploration prototype
☆21Jun 23, 2021Updated 5 years ago
NotToDisturb / AudioExporter
View on GitHub
Export WAV audio files from VALORANT
☆11Aug 1, 2023Updated 2 years ago
MrPowerScripts / meme-cd
View on GitHub
Continuous Meme Delivery
☆12Dec 7, 2022Updated 3 years ago
MisterSpyx / SMTP-CHECKER
View on GitHub
☆11Jan 14, 2021Updated 5 years ago
simonrdavies / NapierOne
View on GitHub
NapierOne. A Publicly Available Modern Mixed File Data Set. The data set is suitable for a variety of testing scenarios such as Ransomwar…
☆25Jan 25, 2022Updated 4 years ago
captainGeech42 / synapse-sinkdb
View on GitHub
Synapse Rapid Power-up for SinkDB
☆11Jun 24, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
dbmdz / solr-ocrpayload-plugin
View on GitHub
Efficient indexing and retrieval of OCR bounding boxes in Solr
☆22Mar 13, 2019Updated 7 years ago
PackageFoundation / yap
View on GitHub
Package software with ease 📦 Versatile deb, rpm and apk packager fueled by PKGBUILD specfiles and golang
☆13Mar 4, 2024Updated 2 years ago
cinecert / mxf-digest
View on GitHub
Efficient Message Digest for MXF Files
☆10Jul 6, 2020Updated 6 years ago
scarybeasts / misc
View on GitHub
Miscellaneous small bits and bobs.
☆11Sep 8, 2025Updated 10 months ago
4m3rr0r / zero-setup
View on GitHub
Zero Setup is a Bash script that automates the installation process of all the personal tools and software you need on your system. It s…
☆13Nov 19, 2023Updated 2 years ago
centic9 / CommonCrawlDocumentDownload
View on GitHub
A small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types. This is used for mass-test…
☆74Jul 11, 2026Updated 2 weeks ago
geofabrik / graphhopper
View on GitHub
An open source route planning library and server using OpenStreetMap.
☆13May 26, 2026Updated 2 months ago
montysecurity / InfraSpyder
View on GitHub
Automatically spider the result set of a Censys/Shodan search and download all files where the file name or folder path matches a regex.
☆29Apr 22, 2023Updated 3 years ago
snw35 / ffsync
View on GitHub
Firefox Sync Server Docker Container
☆10Sep 23, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
squaresapp / strawjs
View on GitHub
A static site generator where your website source code becomes a runnable block of JavaScript.
☆11Mar 14, 2024Updated 2 years ago
xrsrke / stable-diffusion-from-scratch
View on GitHub
Implementation of Stable Diffusion from scratch [WORK IN PROGRESS]
☆22Feb 18, 2023Updated 3 years ago
sensepost / capchan
View on GitHub
Solving CAPTCHA with Image Classification
☆10Mar 13, 2025Updated last year
nthdeveloper / NthTelnetServer
View on GitHub
Very simple Telnet server written in C#. You can add your own commands and enable password control for connecting to the telnet server. V…
☆14Feb 7, 2018Updated 8 years ago
jlivingood / IETF-L4S-Deployment
View on GitHub
IETF L4S Deployment Design Recommendations
☆21May 19, 2026Updated 2 months ago
toptensoftware / PetaTest
View on GitHub
PetaTest is tiny but powerful, embeddable, dependency free Unit Testing framework for .NET and Mono.
☆13Jul 23, 2018Updated 8 years ago
pluribus-one / gdpr-registry-app
View on GitHub
Open-source web application to keep track of all data processing activities prefigured by GDPR Article 30 "Records of processing activiti…
☆24Apr 21, 2023Updated 3 years ago
mdewilde / podcast-parser
View on GitHub
Java library for parsing podcast feed XML files
☆11Feb 23, 2024Updated 2 years ago
Dynatrace / agent-nodejs
View on GitHub
Dynatrace agent for PaaS environments
☆15Dec 15, 2025Updated 7 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
trothtech / xfl
View on GitHub
☆13Jun 24, 2025Updated last year
Jertzukka / xenforo-scraper
View on GitHub
Media scraper for Xenforo-forums written in Python.
☆24Mar 20, 2024Updated 2 years ago
DoctorWkt / unix_timesharing_paper
View on GitHub
This repository tracks the changes the the "Unix Timesharing System" paper written by Dennis Ritchie and Ken Thompson.
☆11Oct 6, 2018Updated 7 years ago
kamilsarelo / dynatrace-time-tracking
View on GitHub
☆11May 7, 2020Updated 6 years ago
alenkacz / gradle-json-validator
View on GitHub
Gradle plugin for json validation
☆10Dec 19, 2021Updated 4 years ago
vbrajon / cutjs.com
View on GitHub
🔪 Shortcut Utils
☆18Feb 9, 2026Updated 5 months ago
tedbyron / golem
View on GitHub
Customizable cellular automaton simulator
☆12Mar 14, 2025Updated last year