HawkClaws / main_content_extractor
A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.
☆22Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for main_content_extractor
- A Python library to chunk/group your texts based on semantic similarity.☆85Updated 4 months ago
- This code sets up a simple yet robust server using FastAPI for handling asynchronous requests for embedding generation and reranking task…☆56Updated 6 months ago
- A fast and lightweight pure Python library for splitting text into semantically meaningful chunks.☆186Updated 4 months ago
- ☆78Updated this week
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆53Updated 3 weeks ago
- LLM finetuning☆42Updated last year
- ☆162Updated 3 weeks ago
- Split and analyze text files using langchain and streamlit☆45Updated 6 months ago
- Crawl and convert any website into clean markdown☆42Updated 5 months ago
- I have explained how to create superior RAG pipeline for complex pdfs using LlamaParse. We can extract text and tables from pdf and QA on…☆39Updated 8 months ago
- OpenAI document chatbot using llama-index, pinecone and chainlit. With incremental features, giving you the tools to go from a basic RAG …☆55Updated 6 months ago
- ☆182Updated this week
- LLM prompt language based on Jinja. Banks provides tools and functions to build prompts text and chat messages from generic blueprints. I…☆67Updated last week
- Using LlamaIndex, Redis, and OpenAI to chat with PDF documents. Supplementary material for blog post on Microsoft Developer Blog☆108Updated last year
- Repository for deepdoctection tutorial notebooks☆39Updated 4 months ago
- Explore the use of DSPy for extracting features from PDFs 🔎☆33Updated 8 months ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆62Updated 2 weeks ago
- An JS web client for connecting to Pipecat bots with voice and vision☆38Updated 4 months ago
- Sentence Transformers API: An OpenAI compatible embedding API server☆37Updated 2 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆134Updated 2 months ago
- ☆19Updated 8 months ago
- RAGArch is a Streamlit-based application that empowers users to experiment with various components and parameters of Retrieval-Augmented …☆80Updated 9 months ago
- Code for react youtube tutorial☆30Updated 9 months ago
- C++ inference wrappers for running blazing fast embedding services on your favourite serverless like AWS Lambda. By Prithivi Da, PRs welc…☆20Updated 8 months ago
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)☆74Updated 2 months ago
- Open-source RAG evaluation through users' feedback☆161Updated 7 months ago
- ☆41Updated 8 months ago
- DSPY on action with OpenSource LLMs.☆57Updated 7 months ago
- Clone of https://r.jina.ai which is deployable locally☆28Updated 2 months ago