A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
☆299May 19, 2025Updated 9 months ago
Alternatives and similar repositories for extractnet
Users that are interested in extractnet are comparing it to the libraries listed below
Sorting:
- Article extraction benchmark: dataset and evaluation scripts☆354Sep 23, 2025Updated 5 months ago
- Just the facts -- web page content extraction☆1,280Jul 8, 2025Updated 7 months ago
- A python based HTML to text conversion library, command line client and Web service.☆337Nov 18, 2025Updated 3 months ago
- Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…☆5,337Sep 12, 2025Updated 5 months ago
- AI based web-wrapper for web-content-extraction☆102Feb 6, 2023Updated 3 years ago
- Heuristic based boilerplate removal tool☆811Feb 25, 2025Updated last year
- Source code for the paper "Web2Text: Deep Structured Boilerplate Removal", full paper @ ECIR'18☆170Oct 28, 2021Updated 4 years ago
- ☆22Jun 30, 2021Updated 4 years ago
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler☆23Mar 21, 2021Updated 4 years ago
- Parses and Plots Illumina SAV files☆13Jan 30, 2019Updated 7 years ago
- Vast-ai public repository for open sourced tools, plugins, etc.☆16Nov 4, 2024Updated last year
- Talk to your computer. You know you want to.☆11Mar 13, 2016Updated 9 years ago
- news-please - an integrated web crawler and information extractor for news that just works☆2,387Sep 21, 2025Updated 5 months ago
- Simple audio AE☆13Nov 10, 2024Updated last year
- Exploits Wikipedia's daily view counts to find out what topics are current trends☆18May 7, 2013Updated 12 years ago
- Sequence to sequence model for Arabic punctuation prediction.☆12Feb 13, 2020Updated 6 years ago
- Protégé Desktop plugin for defeasible reasoning in OWL ontologies using the style of Kraus, Lehmann and Magidor☆14May 1, 2020Updated 5 years ago
- Sample solution to build a deployment pipeline for Amazon SageMaker.☆13Jul 18, 2022Updated 3 years ago
- Speech recognition module for Python, supporting several engines and APIs, online and offline.☆13Mar 9, 2022Updated 3 years ago
- Python interface to XSB Prolog, SWI Prolog, ECLiPSe Prolog, Datalog Educational System and Flora-2/Ergo Lite☆10Feb 27, 2021Updated 5 years ago
- code and data used to build a training dataset for dragnet models☆10Nov 29, 2020Updated 5 years ago
- RAG-Fusion implementation using Langchain, Weaviate and OpenAI☆13Oct 31, 2023Updated 2 years ago
- Evaluation of STT models for german language☆15Jan 22, 2022Updated 4 years ago
- A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html☆902Feb 6, 2026Updated 3 weeks ago
- Detect and classify pagination links☆105Feb 10, 2026Updated 2 weeks ago
- YouTube-Based Multimodal Recipe Recommender☆14Jul 11, 2024Updated last year
- a repository for trainabale tts multi speaker☆14Nov 28, 2021Updated 4 years ago
- a boilerplate removal algorithm☆12Mar 22, 2016Updated 9 years ago
- linkbak is a web page archiver : it reads a list of links and dumps the corresponding pages in HTML and PDF.☆13Dec 8, 2022Updated 3 years ago
- AI-based web extractor☆12Feb 25, 2023Updated 3 years ago
- A graph query engine☆23Nov 25, 2025Updated 3 months ago
- Earth Mover's Distance based Similarity Join on Hadoop☆12Mar 9, 2016Updated 9 years ago
- Labeled data for homograph disambiguation☆62Jun 1, 2023Updated 2 years ago
- Extract embedded metadata from HTML markup☆951Oct 1, 2025Updated 4 months ago
- Audio samples accompanying publications related to DF-Conformer, a speech enhancement model.☆31May 22, 2025Updated 9 months ago
- ☆15Apr 26, 2024Updated last year
- mule is a tool to be used with 'go generate' to embed external resources files into Go code.☆16Aug 16, 2021Updated 4 years ago
- ☆16Apr 24, 2024Updated last year
- NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment☆16Apr 13, 2022Updated 3 years ago