Fast and robust date extraction from web pages, with Python or on the command-line
☆146Nov 4, 2025Updated 4 months ago
Alternatives and similar repositories for htmldate
Users that are interested in htmldate are comparing it to the libraries listed below
Sorting:
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆161Dec 19, 2025Updated 3 months ago
- Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…☆5,517Sep 12, 2025Updated 6 months ago
- SFST/SMOR/DWDS-based German Morphology☆21Updated this week
- Automatically extracts and normalizes an online article or blog post publication date☆119Aug 10, 2023Updated 2 years ago
- Article extraction benchmark: dataset and evaluation scripts☆356Mar 1, 2026Updated 3 weeks ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆188Jun 6, 2025Updated 9 months ago
- Python port of Boilerpipe library☆96Aug 20, 2024Updated last year
- A lexical normalizer for historical spelling variants using a transformer architecture.☆10Mar 12, 2025Updated last year
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Aug 13, 2019Updated 6 years ago
- Multi Tier Annotation Search☆12May 13, 2024Updated last year
- FairCopy is a word processor for the humanities scholar.☆13Jan 26, 2026Updated last month
- texrex web page cleaning & ClaraX random walk crawler☆11Dec 13, 2021Updated 4 years ago
- Next-generation Punkt sentence boundary detection with zero dependencies☆29Nov 18, 2025Updated 4 months ago
- A parser, formatter, validator, and language server for SQLite SQL. Built on SQLite's own grammar and tokenizer☆85Updated this week
- Python tool to support lazy imports.☆31Jun 9, 2025Updated 9 months ago
- Comparing warc files☆17Feb 21, 2019Updated 7 years ago
- GC4LM: A Colossal (Biased) language model for German☆13May 2, 2021Updated 4 years ago
- An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic a…☆18Nov 18, 2024Updated last year
- pre-commit hook for Djade.☆13Feb 27, 2026Updated 3 weeks ago
- Common Voice Dataset explorer☆27Jul 4, 2022Updated 3 years ago
- A .decorate(fn) method for Django QuerySets, for clever lazily evaluated expressions.☆17Jul 3, 2012Updated 13 years ago
- KB data lab☆10Dec 8, 2020Updated 5 years ago
- Pyinfer is a model agnostic tool for ML developers and researchers to benchmark the inference statistics for machine learning models or f…☆24Feb 19, 2021Updated 5 years ago
- DM is an environment for the study and annotation of images and texts. It is a suite of tools, enabling scholars to gather and organize t…☆19Dec 10, 2018Updated 7 years ago
- SQL functions for calling OpenAI APIs☆22Jan 14, 2023Updated 3 years ago
- Code and data for the WSDM '21 paper "Quotebank: A Corpus of Quotations from a Decade of News"☆21Jul 23, 2021Updated 4 years ago
- An implementation of GrASP (Shnarch et. al., 2017)☆23Aug 29, 2022Updated 3 years ago
- Datasette plugin for inserting and updating data☆20Mar 29, 2024Updated last year
- Text generation using language models with multiple exit heads☆16Sep 18, 2025Updated 6 months ago
- Python SDK for One AI APIs. One AI is an NLP-as-a-service platform. Our APIs enables language comprehension in context, transforming text…☆39Aug 24, 2023Updated 2 years ago
- ☆16Oct 16, 2024Updated last year
- ☆18Feb 28, 2022Updated 4 years ago
- RaKUn 2.0 - A fast keyword detection algorithm☆72Aug 5, 2025Updated 7 months ago
- A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html☆904Feb 6, 2026Updated last month
- a repository containing the details of natural language inference dataset in Hindi☆14Dec 28, 2020Updated 5 years ago
- Just the facts -- web page content extraction☆1,279Jul 8, 2025Updated 8 months ago
- Spellchecker service based on hunspell for 90 languages☆10Oct 26, 2020Updated 5 years ago
- ☆13Nov 28, 2025Updated 3 months ago
- Dataset of sentences from Hindi stories tagged with different emotion tags☆11Nov 26, 2019Updated 6 years ago