sail-sg / sailcraftLinks
🚢 Data Toolkit for Sailor Language Models
☆91Updated 3 months ago
Alternatives and similar repositories for sailcraft
Users that are interested in sailcraft are comparing it to the libraries listed below
Sorting:
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆142Updated 7 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆78Updated last year
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆135Updated 6 months ago
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 8 months ago
- ☆62Updated 10 months ago
- This repository contains the joint use of CPO and SimPO method for better reference-free preference learning methods.☆53Updated 9 months ago
- Unofficial implementation of AlpaGasus☆91Updated last year
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆77Updated 7 months ago
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆50Updated last month
- ☆57Updated 8 months ago
- ☆120Updated 8 months ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆202Updated last year
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆89Updated 6 months ago
- Reformatted Alignment☆114Updated 8 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆128Updated last year
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning☆89Updated last year
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners☆116Updated 8 months ago
- This project studies the performance and robustness of language models and task-adaptation methods.☆150Updated last year
- Code for KaLM-Embedding models☆77Updated 2 months ago
- AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark☆142Updated 5 months ago
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆55Updated 8 months ago
- Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"☆73Updated 2 weeks ago
- Retrieval Augmented Generation Generalized Evaluation Dataset☆53Updated 6 months ago
- This is the official repository for Inheritune.☆111Updated 3 months ago
- Code for Zero-Shot Tokenizer Transfer☆128Updated 4 months ago
- ☆69Updated last year
- Finetune mistral-7b-instruct for sentence embeddings☆82Updated last year
- Code accompanying "How I learned to start worrying about prompt formatting".☆105Updated 8 months ago
- Small and Efficient Mathematical Reasoning LLMs☆71Updated last year
- evol augment any dataset online☆59Updated last year