carlosplanchon / betterhtmlchunking
BetterHTMLChunking is a Python library for intelligent HTML segmentation. It builds a DOM tree from raw HTML and extracts content-rich regions of interest, making content analysis effortless. Great for LLM based processing.
☆33Updated 3 weeks ago
Alternatives and similar repositories for betterhtmlchunking:
Users that are interested in betterhtmlchunking are comparing it to the libraries listed below
- Python library for Entities, relationships and schemas extraction from documents☆38Updated 4 months ago
- ☆87Updated 2 months ago
- Run AI generated code in isolated sandboxes☆54Updated 2 months ago
- An MCP Server that's also an MCP Client. Useful for letting Claude develop and test MCPs without needing to reset the application.☆115Updated last month
- A discovery and compression tool for your Python codebase. Creates a knowledge graph for a LLM context window, efficiently outlining your…☆88Updated 4 months ago
- ☆37Updated last month
- ☆16Updated 6 months ago
- This small API downloads and exposes access to NeuML's txtai-wikipedia and full wikipedia datasets, taking in a query and returning full …☆90Updated 3 weeks ago
- The AI runtime that turns your framework functions into OpenAI compatible endpoints☆79Updated 2 months ago
- An open source, Gradio-based chatbot app that combines the best of retrieval augmented generation and prompt engineering into an intellig…☆51Updated 8 months ago
- Your appetite for code + Claude's capabilities = Limitless creation. No experience required - just pure hunger! 🧠⚡💻☆17Updated this week
- An example chatbot application built on the Letta API, which makes each chatbot a stateful agent (agent with memory) under the hood.☆33Updated 2 weeks ago
- MarinaBox is a toolkit for creating and managing secure, isolated environments for AI agents☆119Updated 2 months ago
- An Automated AI-Powered Prompt Optimization Framework☆184Updated 8 months ago
- Automated web scraping spider generation using Browser Use and LLMs. Streamline the creation of Playwright-based spiders with minimal man…☆58Updated last week
- Open-source LLM app starter templates – easily get started with a systematic, rapid workflow for taking an LLM app from prototype to prod…☆10Updated 7 months ago
- Turn Webpage to LLM friendly input text. Similar to Firecrawl and Jina Reader API. Makes RAG, AI web scraping, image & webpage links extr…☆163Updated last week
- A novel agentic memory system☆404Updated last month
- ☆21Updated last week
- MCP server for enabling LLM applications to perform deep research via the MCP protocol☆93Updated 3 weeks ago
- For LLMs to better code with Jina API☆145Updated last week
- Optimize Document Retrieval with Fine-Tuned KnowledgeBases☆128Updated last month
- A toolkit for building computer use AI agents☆158Updated this week
- Generate a wiki for your research topic, sourcing from the web and your docs.☆46Updated last month
- This project implements the "Modular RAG" framework using Haystack & Hypster☆32Updated 5 months ago
- A flexible, adaptive classification system for dynamic text classification☆159Updated last month
- Gumloop Unified Model Context Protocol (guMCP)☆346Updated this week
- A virtual employee that scours the web, organizes data, and delivers results in a spreadsheet☆74Updated last week
- Routing on Random Forest (RoRF)☆147Updated 7 months ago
- Official code of the paper "SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation"☆106Updated 4 months ago