zytedata / clear-html
Remove DIVs, style stuff and normalize HTML preserving structure information
☆10Updated 3 months ago
Alternatives and similar repositories for clear-html
Users that are interested in clear-html are comparing it to the libraries listed below
Sorting:
- Fetch all GitHub issues for a repository☆13Updated 9 months ago
- Dockerized FastAPI wrapper around the recognize-anything image recognition models☆25Updated last year
- Scripts and ideas to manage tons and tons of images and movies☆17Updated 2 months ago
- Tools for building SQLite databases from files and directories☆12Updated last year
- Functional composable pipelines allowing clean separation of the business logic and its implementation☆11Updated 11 months ago
- Diff filtering, text mapping, and windowed transforms for LLM apps☆15Updated 3 weeks ago
- A simple github actions script to build a llamafile and uploads to huggingface☆14Updated last year
- Pluggable DSL that uses pipes to perform a series of linear transformations to extract data☆16Updated 10 months ago
- Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of the…☆37Updated 9 months ago
- Load data about Python packages from PyPI into SQLite☆14Updated last year
- Loadable spellfix1 extension for sqlite as python package☆26Updated last year
- Library that helps use puppeteer in scrapy.☆52Updated last month
- Scrape various open data directories to create an index of what's available out there☆36Updated 3 months ago
- A fast TUI application (with optional webui) to visually navigate and inspect JSON and JSONL data. Easily localize parse errors in large …☆13Updated 7 months ago
- Multi-agent workflows and complex Agent interactions, both via YAML manifest and programmatic usage. Pydantic-AI and LiteLLM backends. Hu…☆17Updated last week
- Datasette plugin for searching all searchable tables at once☆24Updated 8 months ago
- Web scraping Page Objects core library☆99Updated 3 months ago
- Web crawler for Burplist, a search engine for craft beers in Singapore☆14Updated this week
- LLM plugin for asking questions of LLM's own documentation, and related packages☆16Updated last week
- A Python library for real-time PostgreSQL event-driven cache invalidation.☆22Updated 3 weeks ago
- Check if timestamp falls within specific boundaries☆11Updated last year
- A curl HTTP adapter switch for requests library — make browser-like requests with custom TLS fingerprints.☆16Updated last week
- HTML to markdown converter☆39Updated 3 weeks ago
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆23Updated 2 months ago
- pyppeteer stealth plugin, attempts to look like a normal browser☆22Updated 7 months ago
- Access llamafile localhost models via LLM☆19Updated last year
- Crawling framework, RSS reader and parser☆27Updated last week
- Via Text Density Simple Web Crawler With Go☆12Updated 2 years ago
- Python data explorer.☆11Updated 5 months ago
- ☆7Updated 4 months ago