HawkClaws / main_content_extractor
A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.
☆28Updated 8 months ago
Alternatives and similar repositories for main_content_extractor:
Users that are interested in main_content_extractor are comparing it to the libraries listed below
- Split and analyze text files using langchain and streamlit☆46Updated 8 months ago
- Complete example of how to build an Agentic RAG architecture with Redis, AWS Bedrock, and LlamaIndex.☆89Updated last month
- Crawl and convert any website into clean markdown☆42Updated 8 months ago
- RAG Citation enhances Retrieval-Augmented Generation (RAG) by automatically generating relevant citations for AI-generated content. It en…☆21Updated 2 months ago
- This code sets up a simple yet robust server using FastAPI for handling asynchronous requests for embedding generation and reranking task…☆57Updated 8 months ago
- Pipeline for converting PDFs to raw text with PaddleOCR☆21Updated last year
- Build reliable, secure, and production-ready AI apps easily.☆57Updated 2 weeks ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆235Updated last week
- A Python library to chunk/group your texts based on semantic similarity.☆92Updated 6 months ago
- Natural Language Interfaces Powered by LLMs☆91Updated 5 months ago
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆39Updated 6 months ago
- Easy to deploy.A cloud service for python code interpreter sandbox for Code-Interpreter.☆48Updated 10 months ago
- I have explained how to create superior RAG pipeline for complex pdfs using LlamaParse. We can extract text and tables from pdf and QA on…☆41Updated 11 months ago
- ElasticSearch agent based on ElasticSearch, LangChain and ChatGPT 4☆43Updated last year
- OpenAI document chatbot using llama-index, pinecone and chainlit. With incremental features, giving you the tools to go from a basic RAG …☆64Updated 9 months ago
- ☆22Updated 5 months ago
- TextEmbed is a REST API crafted for high-throughput and low-latency embedding inference. It accommodates a wide variety of embedding mode…☆22Updated 4 months ago
- Clone of https://r.jina.ai which is deployable locally☆34Updated 4 months ago
- RAGArch is a Streamlit-based application that empowers users to experiment with various components and parameters of Retrieval-Augmented …☆84Updated 11 months ago
- An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate h…☆20Updated 6 months ago
- LLM finetuning☆42Updated last year
- ☆39Updated last year
- An example application built with LangChain CLI and LangServe☆77Updated last year
- Open-source RAG evaluation through users' feedback☆167Updated 9 months ago
- Docutron Toolkit: detection and segmentation analysis for legal data extraction over documents.☆25Updated last year
- Using LlamaIndex, Redis, and OpenAI to chat with PDF documents. Supplementary material for blog post on Microsoft Developer Blog☆110Updated last year
- ✅ Pytest-style test runner for langchain projects☆25Updated last year
- ☆26Updated last year
- This repo is for handling Question Answering, especially for Multi-hop Question Answering☆67Updated last year
- ☆52Updated 11 months ago