sail-sg / sailcraft
π’ Data Toolkit for Sailor Language Models
β88Updated last month
Alternatives and similar repositories for sailcraft:
Users that are interested in sailcraft are comparing it to the libraries listed below
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]β137Updated 5 months ago
- Unofficial implementation of AlpaGasusβ90Updated last year
- β119Updated 6 months ago
- Official implementation for 'Extending LLMsβ Context Window with 100 Samples'β76Updated last year
- Codebase accompanying the Summary of a Haystack paper.β77Updated 6 months ago
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.β54Updated 6 months ago
- Reformatted Alignmentβ115Updated 6 months ago
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curationβ42Updated last month
- β69Updated last year
- LOFT: A 1 Million+ Token Long-Context Benchmarkβ184Updated last week
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)β132Updated 5 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Searchβ79Updated 4 months ago
- Official implementation of paper "Autonomous Data Selection with Language Models for Mathematical Texts" (As Huggingface Daily Papers: htβ¦β80Updated 5 months ago
- β145Updated 11 months ago
- This repository contains the joint use of CPO and SimPO method for better reference-free preference learning methods.β52Updated 8 months ago
- Implementation of "SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models"β27Updated 2 months ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]β105Updated last month
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuningβ90Updated last year
- Benchmarking LLMs with Challenging Tasks from Real Usersβ221Updated 5 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersβ126Updated last year
- β62Updated 8 months ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)β205Updated 10 months ago
- AIR-Bench: Automated Heterogeneous Information Retrieval Benchmarkβ134Updated 3 months ago
- Small and Efficient Mathematical Reasoning LLMsβ71Updated last year
- β67Updated last year
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"β74Updated 10 months ago
- This project studies the performance and robustness of language models and task-adaptation methods.β149Updated 10 months ago
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messagesβ45Updated 4 months ago
- [arXiv preprint] Official Repository for "Evaluating Language Models as Synthetic Data Generators"β34Updated 4 months ago
- Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is alβ¦β111Updated last year