TeamHG-Memex/Formasaurus

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/TeamHG-Memex/Formasaurus)

TeamHG-Memex / Formasaurus

Formasaurus tells you the type of an HTML form and its fields using machine learning

☆119

Alternatives and similar repositories for Formasaurus

Users that are interested in Formasaurus are comparing it to the libraries listed below

Sorting:

TeamHG-Memex / autologin
View on GitHub
A project to attempt to automatically login to a website given a single seed
☆128Feb 23, 2026Updated last week
TeamHG-Memex / MaybeDont
View on GitHub
A component that tries to avoid downloading duplicate content
☆27Feb 10, 2026Updated 3 weeks ago
TeamHG-Memex / autopager
View on GitHub
Detect and classify pagination links
☆105Feb 10, 2026Updated 3 weeks ago
scrapinghub / webstruct
View on GitHub
NER toolkit for HTML data
☆259May 3, 2024Updated last year
TeamHG-Memex / scrapy-dockerhub
View on GitHub
[UNMAINTAINED] Deploy, run and monitor your Scrapy spiders.
☆11Feb 23, 2026Updated last week
TeamHG-Memex / autologin-middleware
View on GitHub
Scrapy middleware for the autologin
☆36Feb 10, 2026Updated 3 weeks ago
TeamHG-Memex / deep-deep
View on GitHub
Adaptive crawler which uses Reinforcement Learning methods
☆168Feb 10, 2026Updated 3 weeks ago
Sotera / Datawake
View on GitHub
Browser add-on and web server to support collection and analysis of web browsing data.
☆14Mar 9, 2016Updated 9 years ago
TeamHG-Memex / soft404
View on GitHub
A classifier for detecting soft 404 pages
☆58Feb 10, 2026Updated 3 weeks ago
TeamHG-Memex / html-text
View on GitHub
Extract text from HTML
☆134Feb 10, 2026Updated 3 weeks ago
nasa-jpl-memex / topic_space
View on GitHub
Topic modeling web application
☆40Jul 23, 2015Updated 10 years ago
povilasb / scrapy-html-storage
View on GitHub
Scrapy downloader middleware that stores response HTMLs to disk.
☆18Jan 14, 2026Updated last month
mitll / vizlinc
View on GitHub
Vizlinc
☆15Jan 14, 2016Updated 10 years ago
mitll / MITIE
View on GitHub
MITIE: library and tools for information extraction
☆29Jan 22, 2015Updated 11 years ago
scrapinghub / mdr
View on GitHub
A python library detect and extract listing data from HTML page.
☆108May 5, 2017Updated 8 years ago
scrapinghub / webpager
View on GitHub
Paginating the web
☆37Feb 11, 2014Updated 12 years ago
TeamHG-Memex / arachnado
View on GitHub
Web Crawling UI and HTTP API, based on Scrapy and Tornado
☆160Feb 10, 2026Updated 3 weeks ago
mitll / topic-clustering
View on GitHub
☆44Jan 15, 2016Updated 10 years ago
TeamHG-Memex / undercrawler
View on GitHub
A generic crawler
☆78Feb 10, 2026Updated 3 weeks ago
VIDA-NYU / domain_discovery_tool_deprecated
View on GitHub
Seed acquisition tool to bootstrap focused crawlers
☆23Apr 24, 2017Updated 8 years ago
NextCenturyCorporation / dig
View on GitHub
Faceted search engine for domain-specific exploration of the Web
☆45Feb 10, 2017Updated 9 years ago
nasa-jpl-memex / image_space
View on GitHub
Interactive Image similarity and Visual Search and Retrieval application
☆95Apr 16, 2024Updated last year
TeamHG-Memex / page-compare
View on GitHub
Simple heuristic for measuring web page similarity (& data set)
☆90Feb 23, 2026Updated last week
scrapinghub / autoextract-spiders
View on GitHub
Pre-built Scrapy spiders for AutoExtract
☆19Apr 24, 2024Updated last year
TeamHG-Memex / imageSimilarity
View on GitHub
Given a new image, determine if it is likely derived from a known image.
☆20Feb 10, 2026Updated 3 weeks ago
TeamHG-Memex / aquarium
View on GitHub
Splash + HAProxy + Docker Compose
☆195Feb 10, 2026Updated 3 weeks ago
pymonger / facetview-memex
View on GitHub
Facet Search interface for MEMEX.
☆13Feb 26, 2015Updated 11 years ago
mitll / graph-qube
View on GitHub
Pattern-of-Behavior Search Tool
☆11Jun 20, 2022Updated 3 years ago
scrapinghub / skinfer
View on GitHub
Skinfer is a tool for inferring and merging JSON schemas
☆141Apr 24, 2024Updated last year
scrapy-plugins / scrapy-jsonschema
View on GitHub
Scrapy schema validation pipeline and Item builder using JSON Schema
☆45Mar 26, 2021Updated 4 years ago
scrapinghub / scrapy-poet
View on GitHub
Page Object pattern for Scrapy
☆127Jan 28, 2026Updated last month
chrismattmann / imagecat
View on GitHub
ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…
☆95Aug 26, 2018Updated 7 years ago
scrapy / scrapely
View on GitHub
A pure-python HTML screen-scraping library
☆1,886Apr 4, 2022Updated 3 years ago
scrapinghub / scrapy-autounit
View on GitHub
Automatic unit test generation for Scrapy.
☆57Jul 12, 2021Updated 4 years ago
TeamHG-Memex / docker-tor-rotator
View on GitHub
A rotating socks proxy using Tor, Delegate and Haproxy
☆13Feb 10, 2026Updated 3 weeks ago
Kitware / SMQTK
View on GitHub
Python toolkit for pluggable algorithms and data structures for multimedia-based machine learning.
☆77Jul 28, 2025Updated 7 months ago
ericwhyne / darpa_open_catalog
View on GitHub
Meta information for the DARPA open catalog project.
☆56Nov 16, 2017Updated 8 years ago
unchartedsoftware / aperture-tiles
View on GitHub
Aperture-Tiles uses familiar web-based map interactions to allow exploration of arbitrary huge data sets.
☆74May 23, 2023Updated 2 years ago
draperlaboratory / user-ale
View on GitHub
The User Activity Logging Engine, or User-ALE, is a logging mechanism used to quantitatively assess the behavioural and cognitive state o…
☆13Aug 26, 2016Updated 9 years ago