sail-sg / sailor-llmLinks

[EMNLP-2024] ⚓️ Sailor: Open Language Models for South-East Asia

☆132

Alternatives and similar repositories for sailor-llm

Users that are interested in sailor-llm are comparing it to the libraries listed below

Sorting:

sail-sg / sailcraft
🚢 Data Toolkit for Sailor Language Models
☆93Updated 4 months ago
GAIR-NLP / ReAlign
Reformatted Alignment
☆113Updated 9 months ago
shizhediao / Post-Training-Data-Flywheel
We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.
☆57Updated 8 months ago
DAMO-NLP-SG / DAMO-SeaLLMs
[ACL 2024 Demo] SeaLLMs - Large Language Models for Southeast Asia
☆168Updated 10 months ago
FreedomIntelligence / MultilingualSIFT
MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning
☆91Updated last year
sail-sg / sailor2
🔱 Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
☆62Updated 3 months ago
yegcjs / mixinglaws
☆101Updated 8 months ago
LLM360 / MegaMath
An Open Math Pre-trainng Dataset with 370B Tokens.
☆89Updated 2 months ago
princeton-nlp / CEPE
[ACL 2024] Long-Context Language Modeling with Parallel Encodings
☆155Updated last year
sail-sg / regmix
[ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)
☆149Updated 4 months ago
SkyworkAI / Skywork-MoE
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
☆133Updated last year
QwenLM / online_merging_optimizers
Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
☆75Updated last year
princeton-nlp / QuRating
[ICML 2024] Selecting High-Quality Data for Training Language Models
☆176Updated last year
fxmeng / mixtral_spliter
Converting Mixtral-8x7B to Mixtral-[1~7]x7B
☆22Updated last year
TIGER-AI-Lab / General-Reasoner
General Reasoner: Advancing LLM Reasoning Across All Domains
☆142Updated 2 weeks ago
FlagOpen / Infinity-Instruct
☆48Updated last year
GAIR-NLP / OPO
☆50Updated last year
TIGER-AI-Lab / LongICLBench
Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]
☆105Updated 4 months ago
CodeCreator / WebOrganizer
Organize the Web: Constructing Domains Enhances Pre-Training Data Curation
☆55Updated last month
tianyi-lab / Superfiltering
[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
☆160Updated this week
dwzhu-pku / LongEmbed
LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)
☆137Updated 7 months ago
krystalan / DRT
Deep Reasoning Translation (DRT) Project
☆225Updated last month
QwenLM / WorldPM
☆86Updated last month
SalesforceAIResearch / GemFilter
☆80Updated 5 months ago
NormXU / Consistent-DynamicNTKRoPE
An Experiment on Dynamic NTK Scaling RoPE
☆64Updated last year
jshuadvd / LongRoPE
Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper
☆137Updated 11 months ago
dwzhu-pku / PoSE
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
☆203Updated last year
THUDM / LongAlign
[EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs
☆250Updated 6 months ago
GAIR-NLP / AIME-Preview
☆68Updated 3 months ago
GeneZC / MiniMA
Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"
☆99Updated 11 months ago