sail-sg / sailcraftView external linksLinks
π’ Data Toolkit for Sailor Language Models
β96Feb 24, 2025Updated 11 months ago
Alternatives and similar repositories for sailcraft
Users that are interested in sailcraft are comparing it to the libraries listed below
Sorting:
- [EMNLP-2024] βοΈ Sailor: Open Language Models for South-East Asiaβ138Dec 21, 2024Updated last year
- β20Apr 16, 2025Updated 9 months ago
- π± Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMsβ71Mar 21, 2025Updated 10 months ago
- Implementation of the paper: "Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoningβ¦β22Nov 2, 2021Updated 4 years ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning Pβ¦β35Aug 15, 2023Updated 2 years ago
- Can LLMs generate code-mixed sentences through zero-shot prompting?β11Apr 18, 2023Updated 2 years ago
- Dataset for TACL 2022 paper: "FeTaQA: Free-form Table Question Answering"β87May 11, 2023Updated 2 years ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)β65Jan 11, 2025Updated last year
- This repository contains source code for the PASTA model, a pre-trained language model for table-based fact verification.β18Dec 27, 2022Updated 3 years ago
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"β23Apr 30, 2025Updated 9 months ago
- β38Jul 13, 2022Updated 3 years ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]β149Oct 27, 2024Updated last year
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Modelsβ¦β40Feb 5, 2024Updated 2 years ago
- A lightweight script for processing HTML page to markdown format with support for code blocksβ82Apr 14, 2024Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS β¦β60Oct 11, 2024Updated last year
- Embedding Recycling for Language modelsβ38Jul 11, 2023Updated 2 years ago
- Aioli: A unified optimization framework for language model data mixingβ32Jan 17, 2025Updated last year
- Difference-based Contrastive Learning for Korean Sentence Embeddingsβ23Updated this week
- Code for "Mixed Cross Entropy Loss for Neural Machine Translation"β20Jul 23, 2021Updated 4 years ago
- An automated tool for discovering insights from research papaer corporaβ137Jun 8, 2024Updated last year
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.β27Feb 15, 2024Updated last year
- Code for Zero-Shot Tokenizer Transferβ142Jan 14, 2025Updated last year
- JAX Scalify: end-to-end scaled arithmeticsβ18Oct 30, 2024Updated last year
- Official PyTorch implementation of CD-MOEβ12Mar 29, 2025Updated 10 months ago
- Seamless Voice Interactions with LLMsβ12Oct 28, 2023Updated 2 years ago
- β18Mar 2, 2025Updated 11 months ago
- Tiny evaluation of leading LLMs on competitive programming problemsβ14Nov 28, 2024Updated last year
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignmentβ81Jun 19, 2024Updated last year
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewardsβ47Apr 15, 2025Updated 10 months ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)β84Oct 23, 2024Updated last year
- β23Nov 20, 2021Updated 4 years ago
- The repository contains code for Adaptive Data Optimizationβ32Dec 9, 2024Updated last year
- Apps that run on modal.comβ13Sep 14, 2025Updated 5 months ago
- β10Oct 24, 2024Updated last year
- Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"β11Jan 10, 2025Updated last year
- Minimal (truly) muP implementation, consistent with TP4 and TP5 papers notationβ14Jan 2, 2026Updated last month
- An Empirical Study of Memorization in NLP (ACL 2022)β13Jun 22, 2022Updated 3 years ago
- [ACL 2025] Analyzing LLMs' Multilingual Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representationsβ16Oct 18, 2025Updated 3 months ago
- β15Jun 2, 2025Updated 8 months ago