paulpierre / markdown-crawlerLinks
A multithreaded πΈοΈ web crawler that recursively crawls a website and creates a π½ markdown file for each page, designed for LLM RAG
β394Updated 11 months ago
Alternatives and similar repositories for markdown-crawler
Users that are interested in markdown-crawler are comparing it to the libraries listed below
Sorting:
- Parse PDFs into markdown using Vision LLMsβ409Updated 6 months ago
- Yet another open source Perplexityβ450Updated 9 months ago
- HTML to Markdown converter and crawler.β583Updated last year
- 90% of what you need for LLM app development. Nothing you don't.β266Updated last month
- π This is an adapted version of Jina AI's Reader for local deployment using Docker. Convert any URL to an LLM-friendly input with a simpβ¦β237Updated 3 weeks ago
- SearchGPT / Perplexity Pages clone, but personalised for you.β244Updated 11 months ago
- ScribeWizard: Generate organized notes from audio using Groq, Whisper, and Llama3β494Updated 6 months ago
- β231Updated last month
- Easily deployable π API to convert PDF to markdown quickly with high accuracy.β881Updated 9 months ago
- Visualize Different Text Splitting Methodsβ284Updated 7 months ago
- Extract structured text from pdfs quicklyβ523Updated last month
- A simple Python program to implement the search-extract-summarize flow.β269Updated last month
- β191Updated last month
- Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, advanced RAG, advanced summaries, scriptable, β¦β475Updated 2 weeks ago
- Super performant RAG pipelines for AI apps. Summarization, Retrieve/Rerank and Code Interpreters in one simple API.β381Updated last year
- This project enhances the construction of RAG applications by addressing challenges, improving accessibility, scalability, and managing dβ¦β146Updated last year
- Clone of https://r.jina.ai which is deployable locallyβ47Updated 10 months ago
- β88Updated last year
- No-code ETL and data pipelines with AI and NLPβ317Updated 5 months ago
- β‘Chat with GitHub Repo Using 200k context window of Claude instead of RAG!β‘β169Updated last year
- Your first AI prompt engineerβ401Updated last month
- Connect and chat with your multiple documents (pdf and txt) through GPT 3.5, GPT-4 Turbo, Claude and Local Open-Source LLMsβ799Updated 6 months ago
- LLM for Long Text Summary (Comprehensive Bulleted Notes)β580Updated last month
- Web scraper made for AI and simplicity in mind. It runs as a CLI that can be parallelized and outputs high-quality markdown content.β524Updated this week
- Structured information extraction from documentsβ317Updated 10 months ago
- Excel spreadsheet crawler and table parser for data extraction and queryingβ150Updated 5 months ago
- An experimental UI for text-to-knowledge-graph generationβ777Updated last year
- The simplest open-source implementation of perplexity.aiβ316Updated 6 months ago
- Self-hosted version of Microsoft's OmniParser Image-to-text modelβ71Updated 2 months ago
- Turn Webpage to LLM friendly input text. Similar to Firecrawl and Jina Reader API. Makes RAG, AI web scraping, image & webpage links extrβ¦β206Updated last month