cxcscmu / Craw4LLMLinks
Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"
☆627Updated 3 months ago
Alternatives and similar repositories for Craw4LLM
Users that are interested in Craw4LLM are comparing it to the libraries listed below
Sorting:
- OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking☆453Updated 2 months ago
- Repo for NAACL 2025 Paper "Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization"☆277Updated 5 months ago
- [ACL 2025 Demo] Repository for the demo and paper: ReasonGraph: Visualisation of Reasoning Paths☆487Updated 3 weeks ago
- A simple agent framework that's capable of browser use + mcp + auto instrument + plan + deep research + more☆279Updated 3 weeks ago
- OpenAI DeepResearch alternative, An AI-driven research system that performs comprehensive, iterative research on any topic using multiple…☆613Updated 2 weeks ago
- [ICLR 2025] The First Multimodal Seach Engine Pipeline and Benchmark for LMMs☆438Updated 4 months ago
- ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆496Updated last week
- ☆252Updated 10 months ago
- Speech to Text but with all the bells and whistles and most importantly AI! AI will clean up your filler words, edit and will refine what…☆314Updated 4 months ago
- ☆906Updated this week
- Mentis: A powerful multi-agent orchestration framework built on LangGraph.☆246Updated last month
- Secretary is an AI-powered tool that analyzes social media content from specified accounts and delivers results via WeChat. It supports c…☆327Updated last month
- python package to parse pdfs with different parsers☆186Updated 6 months ago
- Scira (Formerly MiniPerplx) is a minimalistic AI-powered search engine that helps you find information on the internet. Powered by Vercel…☆121Updated 3 months ago
- 🌐 WebWalker [ACL2025] & WebDancer [Preprint]☆1,069Updated 2 weeks ago
- The world's first Full-Stack Open-Source General AI Agent☆471Updated this week
- Seed-Coder is a family of lightweight open-source code LLMs comprising base, instruct and reasoning models, developed by ByteDance Seed.☆500Updated 2 weeks ago
- ☆466Updated 3 months ago
- Python library for Agentic Document Extraction from LandingAI☆679Updated this week
- ☆211Updated 2 weeks ago
- The Level-Navi Agent, a framework that requires no training and utilizes large language models for deep query understanding and precise s…☆79Updated 5 months ago
- Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video ge…☆640Updated this week
- Query and Summarize your chat messages.☆982Updated 6 months ago
- recursive rag with r1 reasoning☆319Updated last month
- A General-Purpose AI Agent ✨☆354Updated 3 weeks ago
- ☆212Updated 5 months ago
- ☆579Updated 7 months ago
- FlexRAG: A RAG Framework for Information Retrieval and Generation.☆172Updated this week
- 🍎APPL: A Prompt Programming Language. Seamlessly integrate LLMs with programs.☆251Updated 4 months ago
- ZeroSearch: Incentivize the Search Capability of LLMs without Searching☆1,010Updated last week