Fast and robust date extraction from web pages, with Python or on the command-line
☆149Nov 4, 2025Updated 6 months ago
Alternatives and similar repositories for htmldate
Users that are interested in htmldate are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆171Dec 19, 2025Updated 5 months ago
- Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…☆5,970Sep 12, 2025Updated 8 months ago
- SFST/SMOR/DWDS-based German Morphology☆21Apr 28, 2026Updated 3 weeks ago
- A Corpus Data Retrieval Index using Lucene for Look-Ups☆20May 13, 2026Updated last week
- Automatically extracts and normalizes an online article or blog post publication date☆119Aug 10, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Article extraction benchmark: dataset and evaluation scripts☆369Apr 23, 2026Updated 3 weeks ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆199Jun 6, 2025Updated 11 months ago
- Python port of Boilerpipe library☆96Aug 20, 2024Updated last year
- A lexical normalizer for historical spelling variants using a transformer architecture.☆10Mar 12, 2025Updated last year
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Aug 13, 2019Updated 6 years ago
- Multi Tier Annotation Search☆12May 13, 2024Updated 2 years ago
- FairCopy is a word processor for the humanities scholar.☆15Updated this week
- texrex web page cleaning & ClaraX random walk crawler☆11Dec 13, 2021Updated 4 years ago
- Next-generation Punkt sentence boundary detection with zero dependencies☆30Nov 18, 2025Updated 6 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Neural network based lemmatizer for Finnish language☆11Sep 10, 2020Updated 5 years ago
- Python tool to support lazy imports.☆31Jun 9, 2025Updated 11 months ago
- GC4LM: A Colossal (Biased) language model for German☆13May 2, 2021Updated 5 years ago
- An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic a…☆18Nov 18, 2024Updated last year
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆23Aug 18, 2024Updated last year
- KB data lab☆10Dec 8, 2020Updated 5 years ago
- Pyinfer is a model agnostic tool for ML developers and researchers to benchmark the inference statistics for machine learning models or f…☆25Feb 19, 2021Updated 5 years ago
- Moved to https://codeberg.org/araichev/kml2geojson.☆16May 5, 2026Updated 2 weeks ago
- DM is an environment for the study and annotation of images and texts. It is a suite of tools, enabling scholars to gather and organize t…☆19Dec 10, 2018Updated 7 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- SQL functions for calling OpenAI APIs☆22Jan 14, 2023Updated 3 years ago
- Explanation-centered inference for question answering☆16Feb 7, 2018Updated 8 years ago
- An implementation of GrASP (Shnarch et. al., 2017)☆23Aug 29, 2022Updated 3 years ago
- Applying NLP framework to 10-K filings in equity markets☆15Jul 26, 2021Updated 4 years ago
- Datasette plugin for inserting and updating data☆20Mar 29, 2024Updated 2 years ago
- Python SDK for One AI APIs. One AI is an NLP-as-a-service platform. Our APIs enables language comprehension in context, transforming text…☆38Aug 24, 2023Updated 2 years ago
- Useful abstractions for trio☆11Aug 12, 2020Updated 5 years ago
- RaKUn 2.0 - A fast keyword detection algorithm☆73Aug 5, 2025Updated 9 months ago
- A dashboard for ARQ built with FastAPI☆42Dec 15, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Python wrapper for the Lago Rest API☆27Updated this week
- Extract corpora from Wikipedia dumps☆26Mar 26, 2019Updated 7 years ago
- Spellchecker service based on hunspell for 90 languages☆10Oct 26, 2020Updated 5 years ago
- Dataset of sentences from Hindi stories tagged with different emotion tags☆11Nov 26, 2019Updated 6 years ago
- A Berkeley library for probability theory.☆15Jan 14, 2025Updated last year
- Simple and easy-to-use scraper and crawler in Go.☆12May 4, 2020Updated 6 years ago
- Library to test all fields of a python dictionary☆12Mar 28, 2018Updated 8 years ago