paulpierre / markdown-crawler
A multithreaded πΈοΈ web crawler that recursively crawls a website and creates a π½ markdown file for each page, designed for LLM RAG
β383Updated 9 months ago
Alternatives and similar repositories for markdown-crawler
Users that are interested in markdown-crawler are comparing it to the libraries listed below
Sorting:
- Extract structured text from pdfs quicklyβ475Updated 2 months ago
- Super performant RAG pipelines for AI apps. Summarization, Retrieve/Rerank and Code Interpreters in one simple API.β373Updated last year
- Visualize Different Text Splitting Methodsβ256Updated 4 months ago
- HTML to Markdown converter and crawler.β541Updated last year
- Your first AI prompt engineerβ377Updated 6 months ago
- Parse PDFs into markdown using Vision LLMsβ369Updated 3 months ago
- 90% of what you need for LLM app development. Nothing you don't.β260Updated 3 weeks ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.β302Updated last month
- Structured information extraction from documentsβ314Updated 7 months ago
- ScribeWizard: Generate organized notes from audio using Groq, Whisper, and Llama3β487Updated 3 months ago
- High-performance retrieval engine for unstructured dataβ1,378Updated this week
- β182Updated this week
- Prompt optimization scratchβ730Updated last month
- An experimental UI for text-to-knowledge-graph generationβ772Updated last year
- Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, advanced RAG, advanced summaries, scriptable, β¦β456Updated this week
- Scrape the webpage convert it into Markdown, and enhance AI search applications.β247Updated last year
- clean & curate your data with LLMs.β490Updated 10 months ago
- podcastfy.ai gradio demo appβ332Updated 5 months ago
- β223Updated 5 months ago
- β727Updated 2 weeks ago
- The simplest open-source implementation of perplexity.aiβ310Updated 3 months ago
- β453Updated 2 months ago
- Edge full-stack LLM platform. Written in Rustβ378Updated 11 months ago
- RESTai is an AIaaS (AI as a Service) open-source platform. Built on top of LlamaIndex & Langchain. Supports any public LLM supported by Lβ¦β424Updated this week
- Teach LangChain using LangChain!β258Updated last year
- π₯ This repository contains complete application examples, including websites and other projects, developed using Firecrawl.β343Updated this week
- SearchGPT / Perplexity Pages clone, but personalised for you.β237Updated 8 months ago
- A simple Python program to implement the search-extract-summarize flow.β262Updated 3 months ago
- Tuning and Evaluation of RAG pipeline. (Automated optimization to be added soon)β263Updated last year
- Clone of https://r.jina.ai which is deployable locallyβ44Updated 8 months ago