cxcscmu / Craw4LLM
Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"
☆595Updated last month
Alternatives and similar repositories for Craw4LLM:
Users that are interested in Craw4LLM are comparing it to the libraries listed below
- Repository for the demo and paper: ReasonGraph: Visualisation of Reasoning Paths☆404Updated this week
- OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking☆436Updated this week
- OpenAI DeepResearch alternative, An AI-driven research system that performs comprehensive, iterative research on any topic using multiple…☆461Updated 2 weeks ago
- 🌐 WebWalker: Benchmarking LLMs in Web Traversal☆378Updated last week
- Speech to Text but with all the bells and whistles and most importantly AI! AI will clean up your filler words, edit and will refine what…☆285Updated last month
- Repo for NAACL 2025 Paper "Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization"☆263Updated 2 months ago
- ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆412Updated last week
- [ICLR 2025] The First Multimodal Seach Engine Pipeline and Benchmark for LMMs☆421Updated 2 months ago
- Query and Summarize your chat messages.☆834Updated 3 months ago
- ☆195Updated 7 months ago
- ☆568Updated 5 months ago
- A Model Context Protocol server for converting almost anything to Markdown☆1,047Updated 2 months ago
- Scira (Formerly MiniPerplx) is a minimalistic AI-powered search engine that helps you find information on the internet. Powered by Vercel…☆114Updated last month
- An open-sourced end-to-end VLM-based GUI Agent☆845Updated last month
- GPT-4o-level, real-time spoken dialogue system.☆302Updated 2 months ago
- 一人公司 AI 工具系列,长期更新,帮助大家提升工作效率,开启一人公司! One-Person Company AI Tools Series – continuously updated to help boost productivity and empower you…☆190Updated this week
- The Level-Navi Agent, a framework that requires no training and utilizes large language models for deep query understanding and precise s…☆78Updated 3 months ago
- FlexRAG: A RAG Framework for Information Retrieval and Generation.☆136Updated this week
- Dify Schedule. 免费的Dify工作流定时助手,支持多渠道发送通知. A free automated scheduling solution for Dify, powered by GitHub Actions.☆484Updated last month
- ☆172Updated last week
- Your first AI prompt engineer☆369Updated 4 months ago
- An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to infer…☆515Updated 2 weeks ago
- Markdown Conversion☆274Updated last week
- ☆118Updated last month
- ☆416Updated 2 weeks ago
- TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools☆370Updated this week
- Trans Router☆158Updated 2 months ago
- 🍎APPL: A Prompt Programming Language. Seamlessly integrate LLMs with programs.☆242Updated last month
- Full Stack application for retrieving Stock Data and News using LLM, LangChain and LangGraph☆538Updated 3 months ago
- ☆57Updated 3 weeks ago