Fast and robust date extraction from web pages, with Python or on the command-line
☆145Nov 4, 2025Updated 3 months ago
Alternatives and similar repositories for htmldate
Users that are interested in htmldate are comparing it to the libraries listed below
Sorting:
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆159Dec 19, 2025Updated 2 months ago
- Small string compression using smaz compression algorithm. Fast, because it's in C. Supports Python 3+☆13Oct 18, 2025Updated 4 months ago
- Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…☆5,337Sep 12, 2025Updated 5 months ago
- Automatically extracts and normalizes an online article or blog post publication date☆119Aug 10, 2023Updated 2 years ago
- Next-generation Punkt sentence boundary detection with zero dependencies☆28Nov 18, 2025Updated 3 months ago
- Text generation using language models with multiple exit heads☆16Sep 18, 2025Updated 5 months ago
- Applying NLP framework to 10-K filings in equity markets☆14Jul 26, 2021Updated 4 years ago
- A Python library to covert KML files to GeoJSON files☆15Mar 29, 2023Updated 2 years ago
- VoltDB Click Stream Processing Example.☆16Jan 2, 2018Updated 8 years ago
- Download wildfires data from NOAA satellites☆14Updated this week
- CalorieScope is an android application which is designed to help the user to maintain healthy lifestyle.☆12May 23, 2021Updated 4 years ago
- Explanation-centered inference for question answering☆16Feb 7, 2018Updated 8 years ago
- Python port of Boilerpipe library☆96Aug 20, 2024Updated last year
- Neural network based lemmatizer for Finnish language☆11Sep 10, 2020Updated 5 years ago
- Instance Neighbouring by using Knowledge☆18Oct 3, 2024Updated last year
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆169Nov 7, 2022Updated 3 years ago
- text-to-speech alignment java software☆20Aug 25, 2019Updated 6 years ago
- Multi-task model for named-entity recognition, relation extraction, entity mention detection and coreference resolution.☆46Jun 26, 2024Updated last year
- ☆18Feb 28, 2022Updated 3 years ago
- A compact yet versatile menu-bar app for keeping your Mac awake.☆18Jul 7, 2021Updated 4 years ago
- Python SDK for One AI APIs. One AI is an NLP-as-a-service platform. Our APIs enables language comprehension in context, transforming text…☆39Aug 24, 2023Updated 2 years ago
- Code repo for "Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers" (ACL 2023)☆22Nov 1, 2023Updated 2 years ago
- 🌸 Train floret vectors☆18May 4, 2023Updated 2 years ago
- Model implementation for the contextual embeddings project☆40Jun 2, 2025Updated 8 months ago
- Modern, fast (high-performance) asynchronous scraping framework based on standard Python type hints and Pydantic.☆20Feb 25, 2024Updated 2 years ago
- An implementation of GrASP (Shnarch et. al., 2017)☆23Aug 29, 2022Updated 3 years ago
- QLoRA for Masked Language Modeling☆23Sep 11, 2023Updated 2 years ago
- The CleanCoNLL dataset from our EMNLP 2023 paper where we corrected annotation errors and inconsistencies in CoNLL-03.☆25Jul 2, 2024Updated last year
- Just the facts -- web page content extraction☆1,280Jul 8, 2025Updated 7 months ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆157May 24, 2024Updated last year
- A python package for benchmarking interpretability techniques on Transformers.☆215Sep 29, 2024Updated last year
- ☆26Jun 17, 2024Updated last year
- A blazingly fast domain extraction library written in Rust☆67Aug 11, 2025Updated 6 months ago
- This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-s…☆220Jan 20, 2025Updated last year
- Visualize multi-model embedding spaces. The first goal is to quickly get a lay of the land of any embedding space. Then be able to scroll…☆27May 16, 2024Updated last year
- Nearly Inference Free Embeddings: make your RAG queries 500x faster☆70Feb 20, 2026Updated last week
- This project is wraper for Leilex, legal entity identifier API. Includes ISIN-LEI conversion. Search LEI number using company name.☆25Oct 6, 2024Updated last year
- ☆10Nov 10, 2022Updated 3 years ago
- Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML☆65Feb 5, 2026Updated 3 weeks ago