π’ Data Toolkit for Sailor Language Models
β96Feb 24, 2025Updated last year
Alternatives and similar repositories for sailcraft
Users that are interested in sailcraft are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [EMNLP-2024] βοΈ Sailor: Open Language Models for South-East Asiaβ138Dec 21, 2024Updated last year
- β20Apr 16, 2025Updated last year
- π± Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMsβ71Mar 21, 2025Updated last year
- Implementation of the paper: "Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoningβ¦β22Nov 2, 2021Updated 4 years ago
- Dataset for TACL 2022 paper: "FeTaQA: Free-form Table Question Answering"β88May 11, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'β24May 20, 2025Updated 10 months ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning Pβ¦β34Aug 15, 2023Updated 2 years ago
- This repository contains source code for the PASTA model, a pre-trained language model for table-based fact verification.β18Dec 27, 2022Updated 3 years ago
- A lightweight script for processing HTML page to markdown format with support for code blocksβ82Apr 14, 2024Updated 2 years ago
- Can LLMs generate code-mixed sentences through zero-shot prompting?β11Apr 18, 2023Updated 2 years ago
- β13Sep 6, 2022Updated 3 years ago
- β15Mar 12, 2024Updated 2 years ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)β65Jan 11, 2025Updated last year
- An Empirical Study of Memorization in NLP (ACL 2022)β13Jun 22, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- β39Jul 13, 2022Updated 3 years ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]β149Oct 27, 2024Updated last year
- [ACL 2024 Demo] SeaLLMs - Large Language Models for Southeast Asiaβ175Jul 30, 2024Updated last year
- Code for "Mixed Cross Entropy Loss for Neural Machine Translation"β20Jul 23, 2021Updated 4 years ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)β83Oct 23, 2024Updated last year
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Schedulingβ42Dec 29, 2025Updated 3 months ago
- [ACL 2025] Analyzing LLMs' Multilingual Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representationsβ18Oct 18, 2025Updated 5 months ago
- Difference-based Contrastive Learning for Korean Sentence Embeddingsβ23Mar 11, 2026Updated last month
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"β23Apr 30, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Tiny evaluation of leading LLMs on competitive programming problemsβ14Nov 28, 2024Updated last year
- [ICLR 2025] 𧬠RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)β189Feb 17, 2025Updated last year
- β38Oct 10, 2023Updated 2 years ago
- β14Sep 30, 2021Updated 4 years ago
- NusaWrites is an in-depth analysis of corpora collection strategy and a comprehensive language modeling benchmark for underrepresented anβ¦β28Sep 27, 2024Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS β¦β60Oct 11, 2024Updated last year
- β53Dec 7, 2025Updated 4 months ago
- Aioli: A unified optimization framework for language model data mixingβ32Jan 17, 2025Updated last year
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignmentβ82Jun 19, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- The source code of our ACL paper "A Training-free and Reference-free Summarization Evaluation Metric via Centrality-weighted Relevance anβ¦β14May 6, 2023Updated 2 years ago
- NAACL 2024: SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoningβ26Mar 3, 2025Updated last year
- Code base for the EMNLP 2021 Findings paper: Cartography Active Learningβ14Jun 3, 2025Updated 10 months ago
- Learning to Rewrite for Non-Autoregressive Neural Machine Translationβ21Dec 23, 2021Updated 4 years ago
- Minimal (truly) muP implementation, consistent with TP4 and TP5 papers notationβ14Jan 2, 2026Updated 3 months ago
- ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsingβ31Aug 30, 2021Updated 4 years ago
- This is the code for our paper: PLACES: Prompting Language Models for Social Conversation Synthesisβ11Feb 17, 2023Updated 3 years ago