zytedata / clear-htmlLinks
Remove DIVs, style stuff and normalize HTML preserving structure information
☆11Updated 3 months ago
Alternatives and similar repositories for clear-html
Users that are interested in clear-html are comparing it to the libraries listed below
Sorting:
- Python client for Zyte API☆24Updated this week
- Functional composable pipelines allowing clean separation of the business logic and its implementation☆11Updated last year
- Tools for building SQLite databases from files and directories☆12Updated last year
- Page Object pattern for Scrapy☆121Updated last week
- 360M model running in the browser on WebGPU☆22Updated 9 months ago
- A server code for serving BERT-based models for text classification. It is designed by SerpApi for heavy-load prototyping and production …☆14Updated last year
- Load data about Python packages from PyPI into SQLite☆14Updated last year
- Fetch all GitHub issues for a repository☆13Updated 10 months ago
- Advanced memory system for any MCP Client enabling persistent conversations, semantic search, pattern recognition, and intelligent contex…☆23Updated this week
- Web scraping Page Objects core library☆101Updated last week
- Remote web browser automation.☆19Updated 11 months ago
- Lightweight OpenAI wrapper using FastAPI. Add rate limits to OpenAI usage, optionally log and store all API calls, and share regulated Op…☆13Updated last year
- https://mimesniff.spec.whatwg.org/ implementation for Python☆13Updated last year
- LLM plugin for asking questions of LLM's own documentation, and related packages☆16Updated last month
- The short and sweet way to create API clients in Python☆25Updated this week
- Library that helps use puppeteer in scrapy.☆52Updated last month
- Scripts and ideas to manage tons and tons of images and movies☆17Updated 2 months ago
- Pluggable DSL that uses pipes to perform a series of linear transformations to extract data☆16Updated 10 months ago
- Web crawler for Burplist, a search engine for craft beers in Singapore☆14Updated this week
- Git scrapers for scraping the fediverse☆17Updated this week
- Python client for txtai☆15Updated last month
- A collection of prompts for use with the LLM CLI tool☆16Updated last year
- A Python library for real-time PostgreSQL event-driven cache invalidation.☆22Updated last month
- Datasette plugin for searching all searchable tables at once☆24Updated 9 months ago
- A fork of https://github.com/AtuboDad/playwright_stealth☆97Updated this week
- "llm python" is a command to run a Python interpreter in the LLM virtual environment☆33Updated last year
- Plugin for LLM adding a Markov chain generating model☆19Updated 11 months ago
- Spider templates for automatic crawlers.☆29Updated last month
- Library to populate items using XPath and CSS with a convenient API☆48Updated 2 months ago
- Happy Eyeballs for pre-resolved hosts☆28Updated this week