paulpierre / markdown-crawler
A multithreaded πΈοΈ web crawler that recursively crawls a website and creates a π½ markdown file for each page, designed for LLM RAG
β377Updated 8 months ago
Alternatives and similar repositories for markdown-crawler:
Users that are interested in markdown-crawler are comparing it to the libraries listed below
- HTML to Markdown converter and crawler.β538Updated last year
- Lightweight, performant, deep table extractionβ453Updated 3 weeks ago
- Extract structured text from pdfs quicklyβ469Updated last month
- Structured information extraction from documentsβ313Updated 6 months ago
- Parse PDFs into markdown using Vision LLMsβ345Updated 2 months ago
- β177Updated last week
- Prompt optimization scratchβ699Updated last week
- TF-ID: Table/Figure IDentifier for academic papersβ232Updated 9 months ago
- Detect and extract tables to markdown and csvβ742Updated 3 months ago
- Your first AI prompt engineerβ373Updated 5 months ago
- β713Updated 2 months ago
- Yet another open source Perplexityβ438Updated 6 months ago
- Python & JS/TS SDK for running AI-generated code/code interpreting in your AI appβ1,682Updated this week
- The Open Source Memory Layer For Autonomous Agentsβ2,178Updated 6 months ago
- A cool AI Diagram generator from a given topic, that streams the partial diagrams from the incomplete JSONs during generation. Built usinβ¦β210Updated last year
- Tuning and Evaluation of RAG pipeline. (Automated optimization to be added soon)β263Updated last year
- SearchGPT / Perplexity Pages clone, but personalised for you.β236Updated 7 months ago
- An experiment in meeting transcription and diarization with just an LLM. Maybe I went a little overboard thoughβ541Updated 2 weeks ago
- An enterprise-grade AI retriever designed to streamline AI integration into your applications, ensuring cutting-edge accuracy.β284Updated last week
- Super performant RAG pipelines for AI apps. Summarization, Retrieve/Rerank and Code Interpreters in one simple API.β369Updated 11 months ago
- Connect and chat with your multiple documents (pdf and txt) through GPT 3.5, GPT-4 Turbo, Claude and Local Open-Source LLMsβ792Updated 2 months ago
- 90% of what you need for LLM app development. Nothing you don't.β254Updated 3 weeks ago
- A system for agentic LLM-powered data processing and ETLβ1,767Updated this week
- Teach LangChain using LangChain!β257Updated last year
- An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing anβ¦β848Updated 7 months ago
- clean & curate your data with LLMs.β487Updated 9 months ago
- β264Updated 10 months ago
- OpenResearcher, an advanced Scientific Research Assistantβ439Updated 6 months ago
- Visualize Different Text Splitting Methodsβ245Updated 3 months ago
- Flexible and powerful multi-agent AI frameworkβ353Updated 2 weeks ago