Simplified version of a common crawl fetcher
☆17Dec 24, 2025Updated 3 months ago
Alternatives and similar repositories for commoncrawl-fetcher-lite
Users that are interested in commoncrawl-fetcher-lite are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Standalone versions of LUCENE_5205 and other patches: SpanQueryParser, Concordance and Co-occurrence stats☆18Aug 2, 2021Updated 4 years ago
- A DropWizard wrapper around Apache Tika.☆10Dec 22, 2016Updated 9 years ago
- USB HID driver emulation with PID/VID (0x3bca/0x27bb) of Plenom A/S Busylight Alpha, that is supported by Mimikatz. When mimikatz is exec…☆21Sep 6, 2022Updated 3 years ago
- Single server/laptop grade file-observatory☆10Mar 30, 2023Updated 2 years ago
- Trigger an LLM in your CI/CD to auto-complete your work☆11Apr 5, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- File-tests is test-suite for File tool. Previous home: https://fedorahosted.org/file-tests/☆21Dec 18, 2025Updated 3 months ago
- Continuous build system used by Mono and Moonlight.☆34Apr 8, 2020Updated 5 years ago
- Continuous Meme Delivery☆12Dec 7, 2022Updated 3 years ago
- Export WAV audio files from VALORANT☆11Aug 1, 2023Updated 2 years ago
- Software in this repository is not maintained anymore☆11Jul 6, 2022Updated 3 years ago
- Synapse Rapid Power-up for SinkDB☆11Jun 24, 2025Updated 9 months ago
- Efficient Message Digest for MXF Files☆10Jul 6, 2020Updated 5 years ago
- Miscellaneous small bits and bobs.☆11Sep 8, 2025Updated 6 months ago
- An open source route planning library and server using OpenStreetMap.☆13Mar 4, 2026Updated 3 weeks ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Efficient indexing and retrieval of OCR bounding boxes in Solr☆22Mar 13, 2019Updated 7 years ago
- Firefox Sync Server Docker Container☆10Sep 23, 2022Updated 3 years ago
- A small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types. This is used for mass-test…☆73Jan 16, 2026Updated 2 months ago
- Advanced desktop search/corpus exploration prototype☆21Jun 23, 2021Updated 4 years ago
- Automatically spider the result set of a Censys/Shodan search and download all files where the file name or folder path matches a regex.☆28Apr 22, 2023Updated 2 years ago
- A static site generator where your website source code becomes a runnable block of JavaScript.☆11Mar 14, 2024Updated 2 years ago
- Solving CAPTCHA with Image Classification☆10Mar 13, 2025Updated last year
- The Unix line editor☆17Feb 23, 2026Updated last month
- Implementation of Stable Diffusion from scratch [WORK IN PROGRESS]☆22Feb 18, 2023Updated 3 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Tools for preservation of floppy disks☆15Updated this week
- ☆12Jun 24, 2025Updated 9 months ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Aug 5, 2016Updated 9 years ago
- IETF L4S Deployment Design Recommendations☆20Jan 27, 2026Updated 2 months ago
- Open-source web application to keep track of all data processing activities prefigured by GDPR Article 30 "Records of processing activiti…☆24Apr 21, 2023Updated 2 years ago
- PetaTest is tiny but powerful, embeddable, dependency free Unit Testing framework for .NET and Mono.☆13Jul 23, 2018Updated 7 years ago
- This Python tool uses PimEyes for reverse image searches, returning links to pages where matches are found, useful for investigations and…☆11Sep 12, 2024Updated last year
- Java library for parsing podcast feed XML files☆11Feb 23, 2024Updated 2 years ago
- This repository tracks the changes the the "Unix Timesharing System" paper written by Dennis Ritchie and Ken Thompson.☆11Oct 6, 2018Updated 7 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A script to automate the creation of cloud infrastructure for hash cracking.☆15Sep 4, 2019Updated 6 years ago
- Customizable cellular automaton simulator☆12Mar 14, 2025Updated last year
- Gradle plugin for json validation☆10Dec 19, 2021Updated 4 years ago
- A collection of awesome projects in the turfjs ecosystem☆25Dec 9, 2020Updated 5 years ago
- ☆11May 7, 2020Updated 5 years ago
- Diving into Popularity of GitHub Repositories☆13Sep 15, 2023Updated 2 years ago
- A museum of historical and modern regular expression engines, showing their development and influence☆23Dec 26, 2025Updated 3 months ago