paulpierre / markdown-crawler
A multithreaded πΈοΈ web crawler that recursively crawls a website and creates a π½ markdown file for each page, designed for LLM RAG
β342Updated 5 months ago
Alternatives and similar repositories for markdown-crawler:
Users that are interested in markdown-crawler are comparing it to the libraries listed below
- Extract structured text from pdfs quicklyβ383Updated this week
- Structured information extraction from documentsβ297Updated 3 months ago
- Parse PDFs into markdown using Vision LLMsβ197Updated 2 weeks ago
- Yet another open source Perplexityβ404Updated 3 months ago
- 90% of what you need for LLM app development. Nothing you don't.β227Updated last week
- HTML to Markdown converter and crawler.β507Updated last year
- π₯€ RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLiteβ678Updated this week
- Generic rag framework to apply the power of LLMs on any given datasetβ499Updated this week
- Use OpenAI's realtime API for a chatting with your documentsβ306Updated 3 months ago
- A prompting libraryβ150Updated 3 months ago
- This repository hosts a suite of specialized agents designed to power your brainstorming sessions. Each agent brings a unique perspectiveβ¦β279Updated 2 months ago
- SearchGPT / Perplexity Pages clone, but personalised for you.β229Updated 4 months ago
- Lightweight, performant, deep table extractionβ387Updated last month
- On-premises conversational RAG with configurable containersβ297Updated this week
- Minimalist LLM Framework in 100 Lines. Enable LLMs to Program Themselves.β211Updated this week
- Your first AI prompt engineerβ357Updated 2 months ago
- β259Updated 6 months ago
- Prompt optimization scratchβ555Updated last week
- An experiment in meeting transcription and diarization with just an LLM. Maybe I went a little overboard thoughβ368Updated 2 months ago
- High-performance retrieval engine for unstructured dataβ1,121Updated last week
- Flexible and powerful multi-agent AI frameworkβ334Updated this week
- LLM for Long Text Summary (Comprehensive Bulleted Notes)β468Updated this week
- Awesome MCP Servers - A curated list of Model Context Protocol serversβ329Updated this week
- A simple Python program to implement the search-extract-summarize flow.β234Updated last month
- The simplest open-source implementation of perplexity.aiβ281Updated 4 months ago
- β167Updated this week
- β152Updated 3 months ago
- No-code ETL and data pipelines with AI and NLPβ275Updated 2 months ago
- An enterprise-grade AI retriever designed to streamline AI integration into your applications, ensuring cutting-edge accuracy.β279Updated this week
- β198Updated last month