Remove DIVs, style stuff and normalize HTML preserving structure information
☆14Oct 24, 2025Updated 4 months ago
Alternatives and similar repositories for clear-html
Users that are interested in clear-html are comparing it to the libraries listed below
Sorting:
- https://mimesniff.spec.whatwg.org/ implementation for Python☆13Jan 16, 2024Updated 2 years ago
- ☆15Jan 21, 2026Updated 2 months ago
- QMPDClient official repository☆37Nov 18, 2015Updated 10 years ago
- Python port of SymSpell☆17Feb 22, 2019Updated 7 years ago
- ☆10Jun 17, 2017Updated 8 years ago
- Zyte API integration for Scrapy☆40Feb 16, 2026Updated last month
- Spider templates for automatic crawlers.☆34Jan 8, 2026Updated 2 months ago
- Zhouyi model zoo (Maintained at https://github.com/Arm-China/Model_zoo)☆12Dec 30, 2024Updated last year
- Migrated to: https://codeberg.org/openculinary/knowledge-graph☆11Aug 21, 2025Updated 7 months ago
- A linter for Scrapy projects.☆21Feb 25, 2026Updated 3 weeks ago
- Web scraping Page Objects core library☆104Mar 10, 2026Updated last week
- A flutter package for showing quick interactions for any widget☆14Sep 25, 2023Updated 2 years ago
- An AI-powered GitHub search tool utilising Generative UI☆14Jul 20, 2024Updated last year
- Control your Home Assistant media players from your desktop using MPRIS☆33Aug 23, 2024Updated last year
- Turn television drama into storyworld knowledge graphs☆19Apr 19, 2025Updated 11 months ago
- lightweight LAMA inference wrapper☆26Sep 28, 2023Updated 2 years ago
- 一个美观、简单、易用、易二次创作的ESP8266固件!Star、Fork、Follow 三连!!!☆15Feb 10, 2019Updated 7 years ago
- Automatic unit test generation for Scrapy.☆57Jul 12, 2021Updated 4 years ago
- Describes a methodology for use with SHACL 1.2, including reifications☆34Mar 2, 2026Updated 2 weeks ago
- A community visualisation for Google Data Studio in the style of the site speed auditing tool Lighthouse gauges.☆21Feb 5, 2023Updated 3 years ago
- A modular graph-based Retrieval-Augmented Generation (RAG) system☆14Updated this week
- A list of delightful MINDSTORMS software and resources☆15Mar 10, 2025Updated last year
- A pure-Python robots.txt parser with support for modern conventions.☆86Jan 29, 2026Updated last month
- Agent based market simulation☆15Aug 10, 2024Updated last year
- Generate standalone HTML from OpenAPI Specification.☆24Jul 13, 2025Updated 8 months ago
- Apache Pekko based web crawler that uses Playwright to crawl websites and extract text data and links for further processing.☆22Aug 12, 2025Updated 7 months ago
- Bootstrap a server from llama-cpp in a few lines of python☆12Jul 6, 2024Updated last year
- A Dart & Flutter package for translating numbers and dates into a human readable format.☆18Sep 24, 2025Updated 5 months ago
- Official TypeScript/JavaScript SDK for the Supadata API.☆21Feb 23, 2026Updated 3 weeks ago
- Default Twisted does not ship with a CONNECT-enabled HTTP(s) proxy. This code provides one.☆51Feb 21, 2017Updated 9 years ago
- Library to populate items using XPath and CSS with a convenient API☆48Jan 29, 2026Updated last month
- Scrape Airbnb, Booking, Hotels.com from a single JavaScript module. ❗No longer maintained.☆18Apr 18, 2023Updated 2 years ago
- Finetuning Whisper ASR model for Belarusian language☆17Feb 16, 2025Updated last year
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆22Oct 10, 2024Updated last year
- The Florence Tool CLI provides a command-line interface for processing images using the Florence-2 model. This tool allows users to apply…☆16Jan 21, 2025Updated last year
- Remove clutter from URLs and return a canonicalized version☆21Jun 3, 2024Updated last year
- An accurate, extensible, and fast HTML-to-markdown converter.☆23Feb 7, 2026Updated last month
- Simple program to get A LOT OF invites to https://foobar.withgoogle.com/☆31Jun 12, 2019Updated 6 years ago
- Abstraction for communicating with REST API in flutter projects.☆12Mar 13, 2026Updated last week