Framework for evaluating text extraction algorithms implemented as web services
☆42Jun 30, 2012Updated 13 years ago
Alternatives and similar repositories for Text-Extraction-Evaluation
Users that are interested in Text-Extraction-Evaluation are comparing it to the libraries listed below
Sorting:
- A platform for collecting, analyzing, and visualizing social media data.☆13Dec 27, 2020Updated 5 years ago
- Technical question answering NLP bot☆13Sep 16, 2009Updated 16 years ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆13Jan 16, 2024Updated 2 years ago
- Discontinuous Data-Oriented Parsing☆46Jan 5, 2024Updated 2 years ago
- (Archived) A Python library for record linkage and deduplication.☆19Mar 19, 2024Updated last year
- A simple bloom filter for SQLite using Murmur3☆18Sep 13, 2011Updated 14 years ago
- Analyze standard numbers like ARK, DOI, EAN, GTIN, IBAN, ISAN, ISBN, ISMN, ISNI, ISSN, ISTC, ISWC, ORCID, PPN, SICI, UPC, ZDB with Elasti…☆24Jul 5, 2016Updated 9 years ago
- IWNLP: A parser for the German edition of Wiktionary☆13Jul 28, 2023Updated 2 years ago
- ☆18Jun 24, 2017Updated 8 years ago
- ☆20Jan 14, 2026Updated last month
- Python client for Zyte API☆28Feb 10, 2026Updated 2 weeks ago
- Preprocess text for NLP (tokenizing, lowercasing, stemming, sentence splitting, etc.)☆29Jun 7, 2011Updated 14 years ago
- ... just because nltk is too heavy☆35Jul 21, 2010Updated 15 years ago
- A compound splitter based on the semantic regularities in the vector space of word embeddings.☆16Mar 15, 2017Updated 8 years ago
- RWA recurrent neural networks☆17Apr 14, 2017Updated 8 years ago
- Semanticizest: dump parser and client☆20May 11, 2016Updated 9 years ago
- Statistical spell- and (occasional) grammar-checker.☆18Nov 20, 2024Updated last year
- Library for annotation-based dependency injection☆24Dec 9, 2025Updated 2 months ago
- Python bindings for html5ever, using CFFI☆40Nov 9, 2017Updated 8 years ago
- Simple heuristic for measuring web page similarity (& data set)☆90Feb 23, 2026Updated last week
- A python implementation of DEPTA☆83Jan 14, 2017Updated 9 years ago
- Dalphi - Active Learning Platform for Human Interaction☆23Aug 20, 2018Updated 7 years ago
- Keeps a mirror of DBpedia live in sync☆27Sep 20, 2021Updated 4 years ago
- extract difference between two html pages☆32Feb 10, 2026Updated 2 weeks ago
- Web Content Extraction Through Machine Learning☆185Apr 4, 2014Updated 11 years ago
- A parser and autocorrection tool for wiktionary.☆39Dec 4, 2015Updated 10 years ago
- Automatically extracts and normalizes an online article or blog post publication date☆119Aug 10, 2023Updated 2 years ago
- Spider templates for automatic crawlers.☆34Jan 8, 2026Updated last month
- *Deprecated* A fast and accurate part-of-speech tagger for TextBlob.☆101Nov 9, 2015Updated 10 years ago
- Just the facts -- web page content extraction☆1,280Jul 8, 2025Updated 7 months ago
- Source code for Jordan Boyd-Graber's academic webpage.☆11Updated this week
- Slides/code for the Lisbon machine learning school 2017☆28Jul 27, 2017Updated 8 years ago
- A collection of github workflow patterns☆10Feb 1, 2024Updated 2 years ago
- Wireless Brother KH-9xx knitting machine connection☆13Sep 3, 2016Updated 9 years ago
- A framework, data and configs for generating and building Tesseract OCR lang.traineddata model files, specifically for Japanese☆10Dec 9, 2013Updated 12 years ago
- Web content extraction using machine learning☆34Mar 3, 2021Updated 4 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Dec 17, 2021Updated 4 years ago
- Structured Data Extractor. An application to extract structured data from web pages. It uses Data Extraction Based on Partial Tree Alignm…☆49Jun 9, 2012Updated 13 years ago
- A tool to segment text based on frequencies and the Viterbi algorithm "#TheBoyWhoLived" => ['#', 'The', 'Boy', 'Who', 'Lived']☆81Apr 23, 2016Updated 9 years ago