cxcscmu / Craw4LLM
Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"
☆612Updated last month
Alternatives and similar repositories for Craw4LLM:
Users that are interested in Craw4LLM are comparing it to the libraries listed below
- OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking☆443Updated 2 weeks ago
- Repository for the demo and paper: ReasonGraph: Visualisation of Reasoning Paths☆458Updated 3 weeks ago
- OpenAI DeepResearch alternative, An AI-driven research system that performs comprehensive, iterative research on any topic using multiple…☆483Updated last month
- Repo for NAACL 2025 Paper "Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization"☆268Updated 2 months ago
- 🌐 WebWalker: Benchmarking LLMs in Web Traversal☆382Updated last week
- ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆450Updated last month
- ☆235Updated 7 months ago
- Speech to Text but with all the bells and whistles and most importantly AI! AI will clean up your filler words, edit and will refine what…☆297Updated 2 months ago
- Mentis: A powerful multi-agent orchestration framework built on LangGraph.☆182Updated this week
- Query and Summarize your chat messages.☆902Updated 4 months ago
- [ICLR 2025] The First Multimodal Seach Engine Pipeline and Benchmark for LMMs☆427Updated 2 months ago
- Scira (Formerly MiniPerplx) is a minimalistic AI-powered search engine that helps you find information on the internet. Powered by Vercel…☆116Updated last month
- ☆572Updated 5 months ago
- AI ContentCraft is an all-in-one content creation suite that helps creators generate stories, podcast scripts, and multimedia content usi…☆330Updated 2 months ago
- recursive rag with r1 reasoning☆280Updated last month
- Dify Schedule. 免费的Dify工作流定时助手,支持多渠道发送通知. A free automated scheduling solution for Dify, powered by GitHub Actions.☆520Updated 2 months ago
- "Your Fully-Automated Personal AI Assistant, and Open-Source & Cost-Efficient Alternative to OpenAI's Deep Research"☆913Updated 2 weeks ago
- A Model Context Protocol server for converting almost anything to Markdown☆1,359Updated 2 months ago
- MCP 资源精选, MCP指南,Claude MCP,MCP Servers, MCP Clients☆824Updated last week
- Full Stack application for retrieving Stock Data and News using LLM, LangChain and LangGraph☆589Updated 4 months ago
- Deep research agent to help you find the best GitHub repositories 🕵️!☆562Updated last week
- MoLing is a computer-use and browser-use based MCP server. It is a locally deployed, dependency-free office AI assistant.☆255Updated this week
- 🍎APPL: A Prompt Programming Language. Seamlessly integrate LLMs with programs.☆244Updated 2 months ago
- VoiceCanvas,支持Stripe支付的文本转语音系统,支持声音克隆,支持50+语言,支持选择音色,代码100%开源☆305Updated 2 weeks ago
- AI 视频笔记生成工具 让 AI 为你的视频做笔记☆99Updated this week
- The Level-Navi Agent, a framework that requires no training and utilizes large language models for deep query understanding and precise s…☆78Updated 3 months ago
- Trans Router☆161Updated 3 months ago
- Your first AI prompt engineer☆372Updated 5 months ago
- ☆436Updated 2 months ago
- A Model Context Protocol server for searching and analyzing arXiv papers☆921Updated last week