paulpierre / markdown-crawler
A multithreaded πΈοΈ web crawler that recursively crawls a website and creates a π½ markdown file for each page, designed for LLM RAG
β366Updated 7 months ago
Alternatives and similar repositories for markdown-crawler:
Users that are interested in markdown-crawler are comparing it to the libraries listed below
- Prompt optimization scratchβ678Updated 3 weeks ago
- Detect and extract tables to markdown and csvβ734Updated 2 months ago
- Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, scalable (?), WIPβ440Updated this week
- Structured information extraction from documentsβ312Updated 6 months ago
- Extract structured text from pdfs quicklyβ450Updated last month
- Parse PDFs into markdown using Vision LLMsβ327Updated last month
- ScribeWizard: Generate organized notes from audio using Groq, Whisper, and Llama3β483Updated 2 months ago
- Assorted toolsβ716Updated this week
- SearchGPT / Perplexity Pages clone, but personalised for you.β236Updated 7 months ago
- β219Updated 5 months ago
- HTML to Markdown converter and crawler.β532Updated last year
- A fast tool to convert any website into LLM-ready markdown data. Built by https://supermemory.aiβ1,233Updated 8 months ago
- β Stripped down, stable version of firecrawl optimized for self-hosting and ease of contribution. Billing logic and AI features are complβ¦β401Updated 2 weeks ago
- A cool AI Diagram generator from a given topic, that streams the partial diagrams from the incomplete JSONs during generation. Built usinβ¦β208Updated 11 months ago
- A flexible HTTP fetching Model Context Protocol server.β193Updated 2 months ago
- Convert any PDF into a podcast episode!β707Updated 2 weeks ago
- openperplex is an opensource AI search engineβ849Updated 7 months ago
- A simple Python sandbox for helpful LLM data agentsβ241Updated 9 months ago
- 90% of what you need for LLM app development. Nothing you don't.β252Updated this week
- An experiment in meeting transcription and diarization with just an LLM. Maybe I went a little overboard thoughβ536Updated last month
- π»ππ‘ DoctorGPT provides advanced LLM prompting for PDFs and webpages.β245Updated last year
- A lightweight task engine for building stateful AI agents that prioritizes simplicity and flexibility.β914Updated this week
- Your first AI prompt engineerβ369Updated 4 months ago
- MCP server for fetch web page content using Playwright headless browser.β450Updated this week
- Yet another open source Perplexityβ433Updated 5 months ago
- Scrape the webpage convert it into Markdown, and enhance AI search applications.β246Updated 10 months ago
- Turn local files into a prompt for an LLMβ170Updated 2 months ago
- π₯€ RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLiteβ876Updated 2 weeks ago
- Unlock 650+ MCP servers tools in your favorite agentic framework.β230Updated this week
- Connect and chat with your multiple documents (pdf and txt) through GPT 3.5, GPT-4 Turbo, Claude and Local Open-Source LLMsβ791Updated last month