huggingface / distill-bloom-deepspeedLinks

Teacher - student distillation using DeepSpeed

☆18

Alternatives and similar repositories for distill-bloom-deepspeed

Users that are interested in distill-bloom-deepspeed are comparing it to the libraries listed below

Sorting:

huggingface / transformers_bloom_parallel
Techniques used to run BLOOM at inference in parallel
☆37Updated 2 years ago
amazon-science / dq-bart
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization (ACL 2022)
☆50Updated 2 years ago
kernelmachine / silo-lm
SILO Language Models code repository
☆81Updated last year
huawei-noah / Efficient-NLP
☆95Updated last year
sunyt32 / torchscale
Transformers at any scale
☆41Updated last year
jaymody / speculative-sampling
Simple implementation of Speculative Sampling in NumPy for GPT-2.
☆95Updated last year
seonghyeonye / TAPP
[AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following
☆79Updated 10 months ago
joeljang / ELM
[ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuning
☆98Updated 2 years ago
r-three / RAD
Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
☆43Updated last year
huggingface / olm-training
Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.
☆93Updated 2 years ago
benzakenelad / BitFit
Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
☆142Updated 2 years ago
yxuansu / Contrastive_Search_Is_What_You_Need
[TMLR'23] Contrastive Search Is What You Need For Neural Text Generation
☆119Updated 2 years ago
john-hewitt / backpacks-flash-attn
The original Backpack Language Model implementation, a fork of FlashAttention
☆69Updated 2 years ago
HanGuo97 / lq-lora
☆127Updated last year
hadasah / btm
☆75Updated last year
kaistAI / GAP
[ACL 2023] Gradient Ascent Post-training Enhances Language Model Generalization
☆29Updated 10 months ago
snu-mllab / Context-Memory
Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
☆61Updated last year
Lightning-Universe / lightning-ColossalAI
Large Scale Distributed Model Training strategy with Colossal AI and Lightning AI
☆57Updated last year
McGill-NLP / length-generalization
Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
☆136Updated last year
microsoft / AdaMix
This is the implementation of the paper AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning (https://arxiv.org/abs/2205.1…
☆132Updated last year
ShiZhengyan / InstructionModelling
[NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"
☆38Updated last year
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆116Updated 2 years ago
kssteven418 / BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
☆93Updated last year
tau-nlp / scrolls
The official code of EMNLP 2022, "SCROLLS: Standardized CompaRison Over Long Language Sequences".
☆70Updated last year
PythonNut / superbpe
Official code release for "SuperBPE: Space Travel for Language Models"
☆61Updated this week
ThomasScialom / T0_continual_learning
Adding new tasks to T0 without catastrophic forgetting
☆33Updated 2 years ago
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆81Updated 3 years ago
renll / SeqBoat
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆37Updated last year
seonghyeonye / Flipped-Learning
[ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
☆116Updated 2 weeks ago
sgugger / torchdynamo-tests
☆19Updated 2 years ago