huggingface / fineweb-2
☆110Updated 5 months ago
Alternatives and similar repositories for fineweb-2:
Users that are interested in fineweb-2 are comparing it to the libraries listed below
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆199Updated this week
- Manage scalable open LLM inference endpoints in Slurm clusters☆255Updated 9 months ago
- Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".☆112Updated last week
- ☆117Updated last month
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆216Updated 6 months ago
- Code for Zero-Shot Tokenizer Transfer☆127Updated 3 months ago
- 🚢 Data Toolkit for Sailor Language Models☆90Updated 2 months ago
- This is the official repository for Inheritune.☆111Updated 3 months ago
- LOFT: A 1 Million+ Token Long-Context Benchmark☆192Updated 2 weeks ago
- PyTorch building blocks for the OLMo ecosystem☆205Updated this week
- code for training & evaluating Contextual Document Embedding models☆183Updated 3 weeks ago
- The HELMET Benchmark☆142Updated 3 weeks ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆72Updated 8 months ago
- ☆27Updated this week
- Reproducible, flexible LLM evaluations☆198Updated last month
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆205Updated 11 months ago
- Codebase accompanying the Summary of a Haystack paper.☆77Updated 7 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆139Updated 6 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆226Updated this week
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆76Updated 6 months ago
- A pipeline for LLM knowledge distillation☆101Updated last month
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale☆244Updated 3 weeks ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆141Updated 2 weeks ago
- ☆117Updated 8 months ago
- ☆120Updated 7 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆221Updated 6 months ago
- ☆147Updated last year
- A framework for few-shot evaluation of language models.☆30Updated last month
- ☆47Updated 8 months ago
- A simple unified framework for evaluating LLMs☆209Updated 3 weeks ago