OpenMatch / NeuScraperLinks
[ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".
☆227Updated last year
Alternatives and similar repositories for NeuScraper
Users that are interested in NeuScraper are comparing it to the libraries listed below
Sorting:
- This is the code repo for our paper "Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents".☆107Updated 10 months ago
- A lightweight script for processing HTML page to markdown format with support for code blocks☆79Updated last year
- Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".☆238Updated last year
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale☆258Updated last month
- [Preprint] Learning to Filter Context for Retrieval-Augmented Generaton☆194Updated last year
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆85Updated 7 months ago
- Official repo for "Make Your LLM Fully Utilize the Context"☆253Updated last year
- Deep Reasoning Translation (DRT) Project