HawkClaws / main_content_extractor
A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.
☆21Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for main_content_extractor
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆52Updated last week
- A fast and lightweight pure Python library for splitting text into semantically meaningful chunks.☆172Updated 3 months ago
- This code sets up a simple yet robust server using FastAPI for handling asynchronous requests for embedding generation and reranking task…☆55Updated 6 months ago
- ☆77Updated 6 months ago
- I have explained how to create superior RAG pipeline for complex pdfs using LlamaParse. We can extract text and tables from pdf and QA on…☆38Updated 8 months ago
- ☆37Updated 11 months ago
- A Python library to chunk/group your texts based on semantic similarity.☆85Updated 3 months ago
- Universal text classifier for generative models☆20Updated 3 months ago
- Langchain Agent utilizing OpenAI Function Calls to execute Git commands using Natural Language☆44Updated last year
- C++ inference wrappers for running blazing fast embedding services on your favourite serverless like AWS Lambda. By Prithivi Da, PRs welc…☆19Updated 8 months ago
- LLM finetuning☆42Updated last year
- Pipeline for converting PDFs to raw text with PaddleOCR☆20Updated last year
- ☆49Updated 4 months ago
- Open Source Text Embedding Models with OpenAI Compatible API☆131Updated 3 months ago
- This repo is for handling Question Answering, especially for Multi-hop Question Answering☆64Updated 10 months ago
- Generate visual podcasts about novels using open source models☆23Updated last year
- ☆41Updated 7 months ago
- ☆49Updated 8 months ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆14Updated last month
- ☆160Updated 2 weeks ago
- ⚡️ 80x faster language detection with Fasttext | Split text by language for TTS☆120Updated last month
- An JS web client for connecting to Pipecat bots with voice and vision☆37Updated 3 months ago
- Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular…☆52Updated last week
- a Dify tool for storing and retrieving long-term-memory, using Dify built-in Knowledge dataset for storing memories, each user has a stan…☆41Updated 3 months ago
- Conduct consumer interviews with synthetic focus groups using LLMs and LangChain☆43Updated last year
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆129Updated last month
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆59Updated this week
- ☆80Updated 9 months ago
- Build a Streamlit Chatbot using Langchain, ColBERT, Ragatouille, and ChromaDB☆116Updated 9 months ago
- Clone of https://r.jina.ai which is deployable locally☆26Updated last month