Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"
☆654Feb 24, 2025Updated last year
Alternatives and similar repositories for Craw4LLM
Users that are interested in Craw4LLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repository contains the resource introduced in the paper: "Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis"…☆25Oct 15, 2025Updated 8 months ago
- NeuraPress 是一个现代化的 Markdown 编辑器,专注于提供优质的微信公众号排版体验。响应式设计,支持移动设备。搭配 DeepSeek和微信公众号助手使用,碎片时间也能用手机发有排版的文章了。☆1,809Apr 21, 2026Updated 2 months ago
- TrendPublish: 全自动 AI 内容生成与发布系统 | 微信公众号自动化 | 多源数据抓取 (Twitter/X、网站) | DeepseekAI、千问、讯飞模型 | 智能内容分析排序 | 定时发布 | 多模板支持 | Node.js | TypeScript |…☆3,033Jun 14, 2026Updated 2 weeks ago
- [ACL 2025 Demo] Repository for the demo and paper: ReasonGraph: Visualisation of Reasoning Paths☆512Mar 9, 2026Updated 3 months ago
- Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, advanced RAG, advanced summaries, scriptable, …☆518Jun 23, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The first open-source agent skills builder. Define skills by vibe workflow, run on Claude Code, Cursor, Codex & more. Build Clawdbot 🦞· …☆7,403Mar 25, 2026Updated 3 months ago
- Speech to Text but with all the bells and whistles and most importantly AI! AI will clean up your filler words, edit and will refine what…☆335Feb 9, 2025Updated last year
- OpenSource Production ready Customer service with built in Evals and monitoring☆1,452Jun 18, 2026Updated 2 weeks ago
- ☆306Aug 23, 2024Updated last year
- PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation☆2,387Sep 10, 2025Updated 9 months ago
- AI ContentCraft is an all-in-one content creation suite that helps creators generate stories, podcast scripts, and multimedia content usi…☆395Jul 4, 2025Updated last year
- The Level-Navi Agent, a framework that requires no training and utilizes large language models for deep query understanding and precise s…☆82Dec 27, 2024Updated last year
- Query and Summarize your chat messages.☆1,030Dec 4, 2024Updated last year
- A tool to assist in the interpretation of learned features in sparse autoencoders (in particular the four SAE's trained by Joseph Bloom o…☆19Oct 4, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [AAAI 2025] StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization☆226Jan 13, 2026Updated 5 months ago
- Web archiving utility library☆11Jun 19, 2026Updated 2 weeks ago
- Code for ACL25-findings. An LLM-based agent simulation framework that simulates human behavior and generates dynamic, text-based social g…☆96Mar 15, 2026Updated 3 months ago
- [ICLR 2026] A Training-free Iterative Framework for Long Story Visualization☆958Apr 2, 2026Updated 3 months ago
- AI reads books: Page-by-Page PDF Knowledge Extractor & Summarizer. script performs an intelligent page-by-page analysis of PDF books, met…☆2,154Jan 20, 2025Updated last year
- RAG Web UI is an intelligent dialogue system based on RAG (Retrieval-Augmented Generation) technology.☆3,050Apr 6, 2026Updated 2 months ago
- 为所有人准备的AI搞钱团队,帮你把经验和方法跑成一门生意。☆8,276Jun 25, 2026Updated last week
- Toolkit for linearizing PDFs for LLM datasets/training☆17,412Mar 25, 2026Updated 3 months ago
- 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)☆6,885Jul 4, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- 将所有AI 产品接入你的微信,打造你个人AI 助理,帮助你解决更多生活日常。☆423Mar 11, 2026Updated 3 months ago
- ☆30Mar 16, 2026Updated 3 months ago
- Perplexity style AI Search engine clone built with Gemini 2.0 Flash and Grounding☆2,064Jan 4, 2025Updated last year
- Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.☆7,905Nov 19, 2025Updated 7 months ago
- Keep searching, reading webpages, reasoning until it finds the answer (or exceeding the token budget)☆5,192May 1, 2026Updated 2 months ago
- ☆2,772May 2, 2025Updated last year
- 口袋AI,将世界知识装进口袋。pocketpal-ai 中文版☆583Feb 8, 2025Updated last year
- Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切…☆17,579Jun 13, 2026Updated 3 weeks ago
- "AutoAgent: Fully-Automated and Zero-Code LLM Agent Framework"☆9,401Oct 16, 2025Updated 8 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆765Jun 11, 2026Updated 3 weeks ago
- LINEBot☆13Apr 7, 2025Updated last year
- Transform PDFs into AI podcasts for engaging on-the-go audio content.☆845Jun 26, 2026Updated last week
- [EMNLP 2025] OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking☆485Aug 23, 2025Updated 10 months ago
- ☆49Sep 11, 2025Updated 9 months ago
- 一个现代化的全栈 AI Chatbot 应用,使用 React 和 Cloudflare Workers 结合 Connect RPC 构建,通过 Tauri 支持 Web、移动 App 和桌面端☆567Jun 6, 2025Updated last year
- 使用AI大模型,一键生成高清故事短视频。Generate high-definition story short videos with one click using AI large models.☆2,414Mar 12, 2025Updated last year