sail-sg / sailcraftLinks
🚢 Data Toolkit for Sailor Language Models
☆94Updated 10 months ago
Alternatives and similar repositories for sailcraft
Users that are interested in sailcraft are comparing it to the libraries listed below
Sorting:
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆149Updated last year
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆145Updated last year
- ☆159Updated last year
- Reformatted Alignment☆113Updated last year
- ☆129Updated last year
- FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions☆49Updated last year
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆81Updated last year
- [ACL 2025 Findings] Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts (As Huggingface Daily Papers: …☆89Updated last month
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆205Updated last year
- ☆75Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆136Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆79Updated last year
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆56Updated last year
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆73Updated 7 months ago
- Retrieval Augmented Generation Generalized Evaluation Dataset☆59Updated 5 months ago
- Complex Function Calling Benchmark.☆159Updated 11 months ago
- Scripts for generating synthetic finetuning data for reducing sycophancy.☆117Updated 2 years ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆100Updated 2 years ago
- ☆69Updated 2 years ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆246Updated last year
- Synthetic Data Generation for Evaluation☆13Updated 10 months ago
- Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".☆212Updated 6 months ago
- This is the official repository for Inheritune.☆117Updated 10 months ago
- Code accompanying "How I learned to start worrying about prompt formatting".☆113Updated 6 months ago
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆31Updated last year
- The official evaluation suite and dynamic data release for MixEval.☆253Updated last year
- [ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆78Updated last year
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆102Updated last year
- Code for Zero-Shot Tokenizer Transfer☆142Updated 11 months ago
- Unofficial implementation of AlpaGasus☆94Updated 2 years ago