seagatesoft/webdext

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/seagatesoft/webdext)

seagatesoft / webdext

Intelligent Web Data Extractor

☆74

Alternatives and similar repositories for webdext

Users that are interested in webdext are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TeamHG-Memex / url-summary
View on GitHub
Show summary of a large number of URLs in a Jupyter Notebook
☆19Apr 8, 2026Updated 3 months ago
TeamHG-Memex / autopager
View on GitHub
Detect and classify pagination links
☆107Apr 8, 2026Updated 3 months ago
scrapy / pypydispatcher
View on GitHub
A fork of http://pydispatcher.sourceforge.net/ with PyPy support
☆16Jul 3, 2017Updated 9 years ago
xtannier / WebAnnotator
View on GitHub
WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…
☆48Dec 17, 2021Updated 4 years ago
pydepta / pydepta
View on GitHub
A python implementation of DEPTA
☆84Jan 14, 2017Updated 9 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
matthewruttley / mozclassify
View on GitHub
Algorithms for URL Classification
☆19Apr 13, 2015Updated 11 years ago
rmax / databrewer
View on GitHub
The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!
☆41May 29, 2017Updated 9 years ago
scrapinghub / aile
View on GitHub
Automatic Item List Extraction
☆85Jun 15, 2016Updated 10 years ago
scrapinghub / product-extraction-benchmark
View on GitHub
☆16Apr 10, 2026Updated 3 months ago
TeamHG-Memex / soft404
View on GitHub
A classifier for detecting soft 404 pages
☆65Apr 8, 2026Updated 3 months ago
TeamHG-Memex / extract-html-diff
View on GitHub
extract difference between two html pages
☆33Apr 8, 2026Updated 3 months ago
rkrzr / dataset-popular
View on GitHub
A dataset of popular pages (taken from <dir.yahoo.com>) with manually marked up semantic blocks.
☆15Feb 9, 2014Updated 12 years ago
gogartom / TextMaps
View on GitHub
☆91Jun 2, 2016Updated 10 years ago
tiefling-cat / ru-syntax
View on GitHub
Repository for ru-syntax command line tool.
☆15Mar 8, 2022Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
scrapinghub / page_clustering
View on GitHub
A simple algorithm for clustering web pages, suitable for crawlers
☆33Mar 6, 2017Updated 9 years ago
syou6162 / go-active-learning
View on GitHub
go-active-learning is a command line annotation tool for binary classification problem written in Go.
☆15Apr 3, 2021Updated 5 years ago
scrapinghub / mdr
View on GitHub
A python library detect and extract listing data from HTML page.
☆110May 5, 2017Updated 9 years ago
rmax / scrapydo
View on GitHub
Crochet-based blocking API for Scrapy.
☆47Feb 24, 2017Updated 9 years ago
nik0spapp / sdalg
View on GitHub
Web page segmentation and noise removal
☆55Feb 4, 2024Updated 2 years ago
scrapinghub / webpager
View on GitHub
Paginating the web
☆37Feb 11, 2014Updated 12 years ago
hpclab / efficient-query-expansion
View on GitHub
Official repository of "Efficient and Effective Query Expansion for Web Search", Short Paper @ CIKM 2018
☆15Nov 17, 2019Updated 6 years ago
clips / hades
View on GitHub
Repository for the CLiPS HAte speech DEtection System [HADES].
☆25Apr 5, 2018Updated 8 years ago
scrapinghub / js2xml
View on GitHub
Convert Javascript code to an XML document
☆188Mar 14, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
MohamedHmini / iww
View on GitHub
AI based web-wrapper for web-content-extraction
☆102Feb 6, 2023Updated 3 years ago
dialogue-evaluation / morphoRuEval-2017
View on GitHub
☆50Nov 20, 2017Updated 8 years ago
stummjr / scrapy-fieldstats
View on GitHub
A Scrapy extension to log items coverage when the spider shuts down
☆18Apr 11, 2020Updated 6 years ago
unixpickle / rwa
View on GitHub
RWA recurrent neural networks
☆18Apr 14, 2017Updated 9 years ago
AnAppAMonth / ScrapBook-AutoSave-Improved
View on GitHub
An improved version of the ScrapBook AutoSave addon, with some extra features.
☆15Jan 23, 2011Updated 15 years ago
martin-arvidsson / InterpretableWordEmbeddings
View on GitHub
This repository implements models described in ''Interpretale Word Embeddings via Informative Priors''
☆11Aug 29, 2019Updated 6 years ago
ccorcos / meteor-react-mixin
View on GitHub
Meteor mixin for react
☆13Apr 9, 2015Updated 11 years ago
ArturGaspar / scrapy-qtwebkit
View on GitHub
☆13Dec 4, 2019Updated 6 years ago
Mega-DatA-Lab / SpectralLDA
View on GitHub
Spectral LDA
☆13Jun 22, 2018Updated 8 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
croqaz / Stones
View on GitHub
🗿Stones: Persistent key-value containers, compatible with Python dict
☆17Jul 15, 2024Updated 2 years ago
rahulwa / camouflage
View on GitHub
An HTTP proxy server package
☆31Jun 15, 2017Updated 9 years ago
zytedata / zyte-spider-templates
View on GitHub
Spider templates for automatic crawlers.
☆35Mar 26, 2026Updated 3 months ago
scrapinghub / autopager
View on GitHub
Detect and classify pagination links
☆15Sep 9, 2020Updated 5 years ago
TeamHG-Memex / deep-deep
View on GitHub
Adaptive crawler which uses Reinforcement Learning methods
☆167Apr 8, 2026Updated 3 months ago
ellej16 / SumMe
View on GitHub
An Abstractive summarizer for online news articles.
☆18Mar 25, 2015Updated 11 years ago
WladimirSidorenko / CRFSuite
View on GitHub
Tree-Structured, First- and Higher-Order Linear Chain, and Semi-Markov CRFs
☆45Nov 14, 2019Updated 6 years ago