ducdauge/sft-llm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ducdauge/sft-llm)

ducdauge / sft-llm

Scaling Sparse Fine-Tuning to Large Language Models

☆19

Alternatives and similar repositories for sft-llm

Users that are interested in sft-llm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AlanAnsell / peft
View on GitHub
☆22Jul 5, 2024Updated 2 years ago
codogogo / towerparse
View on GitHub
Tower Parse: Low-Resource Dependency Parsing via Hierarchical Source Selection
☆15Aug 20, 2021Updated 4 years ago
cambridgeltl / multi3woz
View on GitHub
The official repository for Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapte…
☆17Jan 15, 2024Updated 2 years ago
RUCAIBox / QuantizedEmpirical
View on GitHub
☆15Sep 24, 2023Updated 2 years ago
sanderland / script_tok
View on GitHub
Code for the paper "BPE stays on SCRIPT", "Which Pieces Does Unigram Tokenization Really Need?" and MinGram
☆18Jun 26, 2026Updated 3 weeks ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
stefan-it / ukrainian-electra
View on GitHub
Ukrainian ELECTRA model
☆12Mar 11, 2023Updated 3 years ago
allenai / easy-to-hard-generalization
View on GitHub
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Jan 17, 2024Updated 2 years ago
AnesBenmerzoug / langsfer
View on GitHub
A library for language transfer methods and algorithms.
☆16Feb 6, 2026Updated 5 months ago
hucsmn / suffix_array
View on GitHub
suffix array construction and searching algorithms for in-memory binary data.
☆13Sep 10, 2022Updated 3 years ago
yannikbenz / zeroe
View on GitHub
From Hero to Zéroe: A Benchmark of Low-Level Adversarial Attacks
☆15Feb 23, 2023Updated 3 years ago
zhangsichengsjtu / AFPQ
View on GitHub
AFPQ code implementation
☆23Nov 6, 2023Updated 2 years ago
TDCSZ327 / HTmuon
View on GitHub
☆15May 2, 2026Updated 2 months ago
danbider / lora-tradeoffs
View on GitHub
Information and artifacts for "LoRA Learns Less and Forgets Less" (TMLR, 2024)
☆22Sep 27, 2024Updated last year
RUCAIBox / BAMBOO
View on GitHub
☆36Mar 25, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
nmrksic / LEAR
View on GitHub
Specialising Word Vectors for Lexical Entailment
☆29Sep 13, 2018Updated 7 years ago
edwardmilsom / function-space-learning-rates-paper
View on GitHub
Code for the paper "Function-Space Learning Rates"
☆23Jun 3, 2025Updated last year
cambridgeltl / xcopa
View on GitHub
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
☆105Feb 4, 2021Updated 5 years ago
rdnfn / icai
View on GitHub
Inverse Constitutional AI [ICLR 2025]: compressing pairwise preference data into a short constitution of principles.
☆42May 6, 2026Updated 2 months ago
zihanghliu / TiDE
View on GitHub
Implementation of Paper: Long-term Forecasting with TiDE: Time-series Dense Encoder
☆20Nov 1, 2024Updated last year
genlm / genlm-backend
View on GitHub
High-performance backend for language model probabilistic programs
☆17Jun 29, 2026Updated 3 weeks ago
MayDomine / Seq1F1B
View on GitHub
Sequence-level 1F1B schedule for LLMs.
☆19Jun 4, 2024Updated 2 years ago
fdschmidt93 / trident-nllb-llm2vec
View on GitHub
Repository for "Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages"
☆15Oct 4, 2024Updated last year
bytedance / AffineQuant
View on GitHub
Official implementation of the ICLR 2024 paper AffineQuant
☆30Mar 30, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
UNITES-Lab / MC-SMoE
View on GitHub
[ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆108Jun 20, 2025Updated last year
xvyaward / owq
View on GitHub
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…
☆72Mar 7, 2024Updated 2 years ago
krangelie / bias-in-german-nlg
View on GitHub
Master thesis: Exploring bias in German NLG (GPT-3 & GerPT-2). Applies regard classification and bias mitigation triggers.
☆16Sep 25, 2024Updated last year
xyLiu339 / OSDD
View on GitHub
[Arxiv 2025] One-Step Diffusion Model for Image Motion-Deblurring
☆21Mar 11, 2025Updated last year
LCM-Lab / Elastic-Attention
View on GitHub
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers
☆24May 26, 2026Updated 2 months ago
Zhang-Yihao / Adversarial-Representation-Engineering
View on GitHub
Official implementation repository for the paper Towards General Conceptual Model Editing via Adversarial Representation Engineering.
☆20Dec 6, 2024Updated last year
bshillingford / ctc-beam-search
View on GitHub
CTC beam search
☆12Oct 26, 2016Updated 9 years ago
whyNLP / Conic10K
View on GitHub
Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.
☆33Dec 6, 2023Updated 2 years ago
launchnlp / SEESAW
View on GitHub
Code, data, and models for "Generative Entity-to-Entity Stance Detection with Knowledge Graph Augmentation"
☆13May 10, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ictnlp / TACS
View on GitHub
Source code for Truth-Aware Context Selection: Mitigating the Hallucinations of Large Language Models Being Misled by Untruthful Contexts
☆17Sep 2, 2024Updated last year
gautierdag / tokenizer-bench
View on GitHub
Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"
☆22Feb 14, 2024Updated 2 years ago
messense / crfsuite-rs
View on GitHub
Rust binding to crfsuite
☆25Jan 31, 2026Updated 5 months ago
PiotrNawrot / sparse-frontier
View on GitHub
The evaluation framework for training-free sparse attention in LLMs
☆127Jan 27, 2026Updated 5 months ago
inria-thoth / ddm4ip
View on GitHub
Diffusion distribution matching for inverse problems
☆17Sep 16, 2025Updated 10 months ago
jax-state-spaces / mamba2-jax
View on GitHub
mamba2-jax: A pure JAX/Flax implementation of Mamba-2 for language modeling and time series forecasting.
☆16Jun 23, 2026Updated last month
SDLAML / disco
View on GitHub
☆16Dec 11, 2025Updated 7 months ago