huggingface / OBELICS

Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.
196Updated 5 months ago

Alternatives and similar repositories for OBELICS:

Users that are interested in OBELICS are comparing it to the libraries listed below