Fast and robust date extraction from web pages, with Python or on the command-line
☆146Nov 4, 2025Updated 5 months ago
Alternatives and similar repositories for htmldate
Users that are interested in htmldate are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆164Dec 19, 2025Updated 3 months ago
- Small string compression using smaz compression algorithm. Fast, because it's in C. Supports Python 3+☆13Oct 18, 2025Updated 5 months ago
- Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…☆5,659Sep 12, 2025Updated 6 months ago
- A Corpus Data Retrieval Index using Lucene for Look-Ups☆20Mar 30, 2026Updated last week
- Automatically extracts and normalizes an online article or blog post publication date☆119Aug 10, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Article extraction benchmark: dataset and evaluation scripts☆361Mar 1, 2026Updated last month
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆190Jun 6, 2025Updated 10 months ago
- Python port of Boilerpipe library☆96Aug 20, 2024Updated last year
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Aug 13, 2019Updated 6 years ago
- Multi Tier Annotation Search☆12May 13, 2024Updated last year
- Stata version of R staggered package; see Roth and Sant’Anna (2023)☆18Feb 5, 2025Updated last year
- Next-generation Punkt sentence boundary detection with zero dependencies☆30Nov 18, 2025Updated 4 months ago
- Neural network based lemmatizer for Finnish language☆11Sep 10, 2020Updated 5 years ago
- Python tool to support lazy imports.☆31Jun 9, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- New Version of CSDID. All in Mata☆12Mar 30, 2023Updated 3 years ago
- GC4LM: A Colossal (Biased) language model for German☆13May 2, 2021Updated 4 years ago
- Common Voice Dataset explorer☆27Jul 4, 2022Updated 3 years ago
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆23Aug 18, 2024Updated last year
- KB data lab☆10Dec 8, 2020Updated 5 years ago
- Pyinfer is a model agnostic tool for ML developers and researchers to benchmark the inference statistics for machine learning models or f…☆24Feb 19, 2021Updated 5 years ago
- Highly opinionated linter for Trio code☆27Updated this week
- A Python library to covert KML files to GeoJSON files☆16Mar 12, 2026Updated 3 weeks ago
- SQL functions for calling OpenAI APIs☆22Jan 14, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Code and data for the WSDM '21 paper "Quotebank: A Corpus of Quotations from a Decade of News"☆22Jul 23, 2021Updated 4 years ago
- Explanation-centered inference for question answering☆16Feb 7, 2018Updated 8 years ago
- A Trello app providing a series of useful views of your data.☆33Mar 18, 2014Updated 12 years ago
- 100k+ topic labeled news articles published from thousands of news websites☆19Aug 18, 2020Updated 5 years ago
- Text generation using language models with multiple exit heads☆16Sep 18, 2025Updated 6 months ago
- Repository for the implementation of csdid and drdid☆20Mar 30, 2023Updated 3 years ago
- Useful abstractions for trio☆11Aug 12, 2020Updated 5 years ago
- RaKUn 2.0 - A fast keyword detection algorithm☆72Aug 5, 2025Updated 8 months ago
- A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html☆905Apr 1, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Neural models for detecting and masking personal information from texts☆16Nov 25, 2022Updated 3 years ago
- Just the facts -- web page content extraction☆1,276Jul 8, 2025Updated 9 months ago
- Dataset of sentences from Hindi stories tagged with different emotion tags☆11Nov 26, 2019Updated 6 years ago
- Caching for HTTPX☆73Oct 6, 2025Updated 6 months ago
- QLoRA for Masked Language Modeling☆23Sep 11, 2023Updated 2 years ago
- Command line tool for digging into WARC files☆51Mar 31, 2026Updated last week
- Micro-framework for Python RPC+Pub/Sub over WebSockets☆13Apr 4, 2016Updated 10 years ago