huggingface / fineweb-2Links
☆126Updated last week
Alternatives and similar repositories for fineweb-2
Users that are interested in fineweb-2 are comparing it to the libraries listed below
Sorting:
- Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".☆170Updated 3 weeks ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆260Updated 11 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆205Updated 2 weeks ago
- 🚢 Data Toolkit for Sailor Language Models☆92Updated 4 months ago
- PyTorch building blocks for the OLMo ecosystem☆238Updated this week
- Complex Function Calling Benchmark.☆114Updated 5 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆226Updated 7 months ago
- ☆124Updated 2 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆255Updated last week
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆144Updated 7 months ago
- Reproducible, flexible LLM evaluations☆214Updated last month
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆55Updated last month
- This is the official repository for Inheritune.☆111Updated 4 months ago
- Benchmarking library for RAG☆209Updated 2 weeks ago
- ☆150Updated last year
- Pretraining Efficiently on S2ORC!☆164Updated 8 months ago
- code for training & evaluating Contextual Document Embedding models☆195Updated last month
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆101Updated 4 months ago
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning☆91Updated last year
- LOFT: A 1 Million+ Token Long-Context Benchmark☆202Updated last week
- Let's build better datasets, together!☆259Updated 6 months ago
- This is the repository for our paper "INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning"☆205Updated 6 months ago
- Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya☆112Updated last month
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆75Updated 10 months ago
- ☆123Updated 8 months ago
- The HELMET Benchmark☆154Updated 2 months ago
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 9 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆137Updated 7 months ago
- ☆115Updated 4 months ago
- A pipeline for LLM knowledge distillation☆104Updated 2 months ago