carlosplanchon / betterhtmlchunking

BetterHTMLChunking is a Python library for intelligent HTML segmentation. It builds a DOM tree from raw HTML and extracts content-rich regions of interest, making content analysis effortless. Great for LLM based processing.
33Updated 3 weeks ago

Alternatives and similar repositories for betterhtmlchunking:

Users that are interested in betterhtmlchunking are comparing it to the libraries listed below