A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.
☆65Jul 29, 2024Updated last year
Alternatives and similar repositories for llm-datasets
Users that are interested in llm-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆11Aug 2, 2022Updated 3 years ago
- A framework for few-shot evaluation of autoregressive language models.☆13Feb 14, 2024Updated 2 years ago
- Codes for "Benchmarking the Generation of Fact Checking Explanations"☆10Aug 16, 2024Updated last year
- ☆39Dec 18, 2023Updated 2 years ago
- Analytic platform for the HAL research archive (in development)☆12Oct 2, 2020Updated 5 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- German Text Embedding Clustering Benchmark☆18Mar 15, 2024Updated 2 years ago
- Materials for "Quantifying the Plausibility of Context Reliance in Neural Machine Translation" at ICLR'24 🐑 🐑☆16Apr 18, 2024Updated 2 years ago
- Load, build and explore Patstat using the Google Cloud Platform☆10Jan 19, 2019Updated 7 years ago
- Specification of a stand-off element for the TEI guidelines☆12Apr 29, 2021Updated 5 years ago
- A python package to run inference with HuggingFace language and vision-language checkpoints wrapping many convenient features.☆28Sep 14, 2024Updated last year
- A Knowledge Base for research software relying on large-scale text mining and curated knowledge sources☆18May 14, 2023Updated 3 years ago
- Train transformer-based models.