sail-sg/sailcraft

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sail-sg/sailcraft)

sail-sg / sailcraft

🚢 Data Toolkit for Sailor Language Models

☆94

Alternatives and similar repositories for sailcraft

Users that are interested in sailcraft are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sail-sg / sailor-llm
View on GitHub
[EMNLP-2024] ⚓️ Sailor: Open Language Models for South-East Asia
☆139Dec 21, 2024Updated last year
sail-sg / ActivePRM
View on GitHub
☆21Apr 16, 2025Updated last year
sail-sg / sailor2
View on GitHub
🔱 Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
☆73Mar 21, 2025Updated last year
oriyor / turning_tables
View on GitHub
Implementation of the paper: "Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning…
☆22Nov 2, 2021Updated 4 years ago
Yale-LILY / FeTaQA
View on GitHub
Dataset for TACL 2022 paper: "FeTaQA: Free-form Table Question Answering"
☆90May 11, 2023Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
SivilTaram / code-html-to-markdown
View on GitHub
A lightweight script for processing HTML page to markdown format with support for code blocks
☆81Apr 14, 2024Updated 2 years ago
ruc-datalab / PASTA
View on GitHub
This repository contains source code for the PASTA model, a pre-trained language model for table-based fact verification.
☆18Dec 27, 2022Updated 3 years ago
google-research-datasets / QAmeleon
View on GitHub
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…
☆34Aug 15, 2023Updated 2 years ago
Rojak-NLP / LLM-Code-Mixing
View on GitHub
Can LLMs generate code-mixed sentences through zero-shot prompting?
☆11Apr 18, 2023Updated 3 years ago
HLTCHKUST / cqr4cqa
View on GitHub
☆13Sep 6, 2022Updated 3 years ago
real-absolute-AI / Unnatural_Language
View on GitHub
The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'
☆24May 20, 2025Updated last year
sail-sg / I-FSJ
View on GitHub
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)
☆65Jan 11, 2025Updated last year
xszheng2020 / memorization
View on GitHub
An Empirical Study of Memorization in NLP (ACL 2022)
☆13Jun 22, 2022Updated 4 years ago
sail-sg / closer-look-LLM-unlearning
View on GitHub
[ICLR 2025] A Closer Look at Machine Unlearning for Large Language Models
☆49Dec 4, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
sail-sg / tty-use
View on GitHub
☆15Oct 13, 2025Updated 9 months ago
DAMO-NLP-SG / DAMO-SeaLLMs
View on GitHub
[ACL 2024 Demo] SeaLLMs - Large Language Models for Southeast Asia
☆175Jul 30, 2024Updated last year
TIGER-AI-Lab / MAmmoTH2
View on GitHub
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
☆146Oct 27, 2024Updated last year
haorannlp / mix
View on GitHub
Code for "Mixed Cross Entropy Loss for Neural Machine Translation"
☆20Jul 23, 2021Updated 5 years ago
sail-sg / Cheating-LLM-Benchmarks
View on GitHub
[ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)
☆86Oct 23, 2024Updated last year
BM-K / KoDiffCSE
View on GitHub
Difference-based Contrastive Learning for Korean Sentence Embeddings
☆23Mar 11, 2026Updated 4 months ago
GSYfate / knnlm-limits
View on GitHub
Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"
☆24Apr 30, 2025Updated last year
infotabs / infotabs
View on GitHub
☆14Sep 30, 2021Updated 4 years ago
idiom-bytes / flaskGPT
View on GitHub
Waffer-thin FlaskGPT on Vercel.
☆12Jun 1, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
IndoNLP / nusa-writes
View on GitHub
NusaWrites is an in-depth analysis of corpora collection strategy and a comprehensive language modeling benchmark for underrepresented an…
☆30Sep 27, 2024Updated last year
sail-sg / regmix
View on GitHub
[ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)
☆195Feb 17, 2025Updated last year
siyan-zhao / prepacking
View on GitHub
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆62Oct 11, 2024Updated last year
fajri91 / IndoMMLU
View on GitHub
☆41Oct 10, 2023Updated 2 years ago
QwenLM / online_merging_optimizers
View on GitHub
Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
☆82Jun 19, 2024Updated 2 years ago
sail-sg / AnytimeReasoner
View on GitHub
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆54Jul 15, 2025Updated last year
SeaEval / SeaEval
View on GitHub
NAACL 2024: SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning
☆26Mar 3, 2025Updated last year
Chen-Wang-CUHK / Training-Free-and-Ref-Free-Summ-Evaluation
View on GitHub
The source code of our ACL paper "A Training-free and Reference-free Summarization Evaluation Metric via Centrality-weighted Relevance an…
☆14May 6, 2023Updated 3 years ago
jjzha / cartography-al
View on GitHub
Code base for the EMNLP 2021 Findings paper: Cartography Active Learning
☆14Jun 3, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
xwgeng / RewriteNAT
View on GitHub
Learning to Rewrite for Non-Autoregressive Neural Machine Translation
☆21Dec 23, 2021Updated 4 years ago
microsoft / SCoRE
View on GitHub
ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing
☆31Aug 30, 2021Updated 4 years ago
Laz4rz / mup
View on GitHub
Minimal (truly) muP implementation, consistent with TP4 and TP5 papers notation
☆14Jan 2, 2026Updated 6 months ago
alexa / places
View on GitHub
This is the code for our paper: PLACES: Prompting Language Models for Social Conversation Synthesis
☆11Feb 17, 2023Updated 3 years ago
qiangning / TORQUE-dataset
View on GitHub
☆17Feb 12, 2024Updated 2 years ago
huggingface / datatrove
View on GitHub
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆3,237Jul 22, 2026Updated last week
dinobby / MAGDi
View on GitHub
The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…
☆40Feb 5, 2024Updated 2 years ago