OpenMatch / NeuScraper

[ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".
221Updated 5 months ago

Alternatives and similar repositories for NeuScraper:

Users that are interested in NeuScraper are comparing it to the libraries listed below