wanghaisheng/awesome-web-data-extractor

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wanghaisheng/awesome-web-data-extractor)

wanghaisheng / awesome-web-data-extractor

A curated list of promising Web Data Extractors resources

☆31

Alternatives and similar repositories for awesome-web-data-extractor

Users that are interested in awesome-web-data-extractor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

axtiva / flexible-graphql-php
View on GitHub
Schema SDL first lib for generate php code from graphql sdl to TypeRegistry with webonyx/graphql-php
☆14Apr 15, 2026Updated 3 months ago
luka-dev / headless-task-server-php
View on GitHub
Helper for sending requests to luka-dev/headless-task-server
☆10Apr 26, 2023Updated 3 years ago
q-m / scrapyd-k8s
View on GitHub
Scrapyd on container infrastructure
☆16May 29, 2026Updated 2 months ago
mheinl / OnionCrawler
View on GitHub
Scrapy spider to recursively crawl for TOR hidden services
☆11Oct 12, 2017Updated 8 years ago
belen-albeza / ldjam-36
View on GitHub
Entry for Ludum Dare #36
☆12Dec 11, 2017Updated 8 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
rigby-sh / solace-medusa-starter-api
View on GitHub
☆12Jun 18, 2026Updated last month
kjam / europarl_scraper
View on GitHub
European Parliament website Python scraper
☆12Oct 19, 2016Updated 9 years ago
labrador-kennel / async-unit
View on GitHub
A PHP8 unit and integration testing framework with first-class support for the @amphp Loop!
☆14Jan 28, 2026Updated 6 months ago
commondataio / awesome-opendata-software
View on GitHub
Awesome list of the software tools related to opendata: data catalogs, ingestion tools, data prep tools and so on
☆38Jul 6, 2026Updated 3 weeks ago
Nekmo / cookiecutter-django-backend
View on GitHub
A cookiecutter for enterprise projects with support for Celery, Django Rest Framework and deployment with Ansible and Docker
☆13Oct 1, 2023Updated 2 years ago
NanoNets / nn-auto-bench
View on GitHub
AutoBench: Benchmarking Automation for Intelligent Document Processing (IDP) with confidence
☆11Mar 18, 2025Updated last year
Sruiis / fix_broswer_enviroment_frame
View on GitHub
自用补环境框架
☆21Jun 23, 2023Updated 3 years ago
GCaptainNemo / RAF2jpg
View on GitHub
convert Fuji raw image（.RAF） to .jpg image
☆11Oct 5, 2021Updated 4 years ago
2833844911 / cy-wasm-vmp
View on GitHub
可对wasm文件一键加壳
☆23May 29, 2026Updated 2 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
ielab / sigir2018-health-search-tutorial
View on GitHub
Repository for the Health Search Tutorial
☆12Aug 27, 2018Updated 7 years ago
skydiver / downloadstation-cli
View on GitHub
Manage your Synology Download Station from your terminal
☆10Jan 7, 2023Updated 3 years ago
Chronasorg / chronas-api
View on GitHub
This API provides authentication and CRUD operations for data used by the Chronas application
☆14Jul 21, 2026Updated last week
nurulid / nurul-bento-profile
View on GitHub
Bento profile, made with Tailwind CSS.
☆10Jul 17, 2024Updated 2 years ago
wangjingyu001 / my_js_parser
View on GitHub
☆24Jan 22, 2025Updated last year
wor / ds-down
View on GitHub
Sends URLs and files to Synology DownloadStation for download
☆12Nov 21, 2014Updated 11 years ago
aryehraber / statamic-location
View on GitHub
Statamic v2 Addon to find locations using Google Maps autocomplete.
☆13Mar 29, 2020Updated 6 years ago
ozansener / RecipeWatch
View on GitHub
☆12Jan 12, 2016Updated 10 years ago
chordify / HarmTrace-Base
View on GitHub
HarmTrace Base: Parsing and unambiguously representing musical chords
☆11Oct 21, 2022Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
bostdiek / PublicWeaklySupervised
View on GitHub
(Machine) Learning to Do More with Less
☆14Jun 11, 2018Updated 8 years ago
jacobandreas / rnn-syn
View on GitHub
Analogs of Linguistic Structure in Deep Representations
☆19Jul 27, 2017Updated 9 years ago
the-coder-o / portfolios.world
View on GitHub
Best place to find portfolio inspiration Browse our curated collection of 309+ exceptional designs to help you create your best portfolio…
☆13Jan 27, 2025Updated last year
hirmeos / entity-fishing-client-python
View on GitHub
Repository hosting the common code for the entity-fishing clients
☆10May 18, 2026Updated 2 months ago
gtuk / rotoxy
View on GitHub
A rotating tor proxy service that starts a configurable number of tor socks proxies and expose them under one reverse proxy
☆12Jun 7, 2021Updated 5 years ago
jdvala / lazytext
View on GitHub
LazyText is inspired by the idea of lazypredict, a library which helps build lot of basic models without much code. LazyText is for text …
☆18Feb 19, 2022Updated 4 years ago
simonw / datasette-insert
View on GitHub
Datasette plugin for inserting and updating data
☆20Mar 29, 2024Updated 2 years ago
koganei / sudolang-rest-api
View on GitHub
A NodeJS and/or Flask backend written in Sudolang
☆19Apr 30, 2023Updated 3 years ago
microsoft / un-knowledge-extraction
View on GitHub
The goal is to pilot Microsoft Cognitive Services to unlock the strategic value of UN unstructured content by building on AI and semantic…
☆16Jul 6, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
hint-lab / doctrack
View on GitHub
Dataset for EMNLP'23 Paper "DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading"
☆11Oct 25, 2023Updated 2 years ago
collectionspace / cspace-installer
View on GitHub
The installer provides an Ansible playbook for setting up CollectionSpace on an Ubuntu server.
☆11Aug 26, 2023Updated 2 years ago
huridocs / pdf-reading-order
View on GitHub
☆16Apr 26, 2024Updated 2 years ago
spro / torch-seq2seq
View on GitHub
Word-level sequence to sequence RNN for translation
☆10Apr 7, 2017Updated 9 years ago
fei3ei / qbdi-trace
View on GitHub
A trace tool based on QBDI
☆26Nov 11, 2025Updated 8 months ago
mawildoer / delta_bot
View on GitHub
☆10Dec 10, 2020Updated 5 years ago
unixpickle / rwa
View on GitHub
RWA recurrent neural networks
☆17Apr 14, 2017Updated 9 years ago