jcpeterson / openwebtext

Open clone of OpenAI's unreleased WebText dataset scraper. This version uses pushshift.io files instead of the API for speed.
714Updated last year

Related projects

Alternatives and complementary repositories for openwebtext