A list of useful Open Source tools and scrapers to gather data for LLMs
☆247Feb 24, 2025Updated last year
Alternatives and similar repositories for llm-data-scrapers
Users that are interested in llm-data-scrapers are comparing it to the libraries listed below
Sorting:
- Personal project, Generative AI, Streamlit, Python☆54Apr 30, 2025Updated 10 months ago
- ☆67Feb 13, 2025Updated last year
- Radiantloom Email Assist 7B is an email-assistant large language model fine-tuned from Zephyr-7B-Beta, over a custom-curated dataset of 1…☆14Jan 19, 2024Updated 2 years ago
- Upload SQLite database files to Datasette☆14Nov 10, 2025Updated 3 months ago
- OpenSource Production ready Customer service with built in Evals and monitoring☆1,437Jan 12, 2026Updated last month
- ☆12Apr 17, 2023Updated 2 years ago
- Measure how understandable a German text is.☆11Feb 11, 2026Updated 3 weeks ago
- AI reads books: Page-by-Page PDF Knowledge Extractor & Summarizer. script performs an intelligent page-by-page analysis of PDF books, met…☆1,578Jan 20, 2025Updated last year
- Open-Source, Free, and AI-Powered News in Short☆410Jan 4, 2025Updated last year
- Workshop: Build with Gemini☆317Jul 7, 2025Updated 8 months ago
- Make any LLM to think like OpenAI o1 and deepseek R1☆489Feb 6, 2025Updated last year
- All tools developed by myself for personal purposes.☆16Feb 1, 2026Updated last month
- Repository of helpers to run chaos experiments with k6☆12Nov 2, 2022Updated 3 years ago
- Turn topics into essays in seconds!☆192Jul 6, 2025Updated 8 months ago
- ⚖️ Awesome LLM Judges ⚖️☆189Apr 28, 2025Updated 10 months ago
- converts webpage content into Markdown format, optimized for LLM training and context☆16Feb 26, 2025Updated last year
- ☆33Nov 21, 2025Updated 3 months ago
- December 14th Python Meetup Files☆39Mar 2, 2013Updated 13 years ago
- A fully autonomous AI artist☆19Jun 19, 2023Updated 2 years ago
- Example code and guides for building with Scrapybara☆139Mar 20, 2025Updated 11 months ago
- An AI-driven daily arXiv paper crawler, analyzer, and organizer tool, focusing on AIGC☆78Updated this week
- 📃 A better UX for chat, writing content, and coding with LLMs.☆5,389Feb 25, 2026Updated last week
- A Deep Research agent from scratch☆216May 18, 2025Updated 9 months ago
- Build and deploy AI-powered APIs in seconds☆753Feb 3, 2025Updated last year
- ContextGem: Effortless LLM extraction from documents☆1,808Feb 22, 2026Updated 2 weeks ago
- ☆17May 8, 2024Updated last year
- A simple Agent Development Kit starter repo with one agent that can get the top Hacker News posts and the trending GitHub repos☆17Apr 21, 2025Updated 10 months ago
- Witsy: desktop AI assistant / universal MCP client☆1,896Updated this week
- A practical approach to managing multiple AI agents in Cursor through strict file-tree partitioning and domain boundaries.☆643Nov 19, 2025Updated 3 months ago
- A UI for Ollama on Mac☆17Jan 22, 2024Updated 2 years ago
- LLM plugin for asking questions of LLM's own documentation, and related packages☆28May 5, 2025Updated 10 months ago
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,940Sep 24, 2025Updated 5 months ago
- Easily create scalable, monetisable backend APIs with Hono + Cloudflare workers. All the batteries included.☆550Feb 2, 2025Updated last year
- 🔥 Generate llms.txt and llms-full.txt files for any website!☆515Jun 17, 2025Updated 8 months ago
- Code examples showing how to use Gemini, Gemma, Imagen, and more.☆50Jan 21, 2026Updated last month
- 基于notion+astro的作品网站模板,可直接部署到vercel☆16Aug 28, 2025Updated 6 months ago
- The SDK interface to Letta Code. Build deeply personalized agents with persistent memory that learn over time.☆50Updated this week
- ☆28May 30, 2025Updated 9 months ago
- This package is the Python implementation of Deepgram's WebVTT and SRT formatting. Given a transcription, this package can return a valid…☆22Oct 7, 2024Updated last year