Sanster / padding_free_llm_train

☆16

Alternatives and similar repositories for padding_free_llm_train

Users that are interested in padding_free_llm_train are comparing it to the libraries listed below

Sorting:

Zyphra / Zyda_processing
☆33Updated 10 months ago
BBuf / RWKV-World-HF-Tokenizer
☆34Updated 9 months ago
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆42Updated 5 months ago
gabrielolympie / moe-pruner
A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size
☆52Updated last month
Infini-AI-Lab / gsm_infinite
☆45Updated 2 months ago
SprocketLab / sparse_matrix_fine_tuning
Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"
☆18Updated this week
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated 7 months ago
FreedomIntelligence / FastLLM
Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
☆37Updated last year
NormXU / Consistent-DynamicNTKRoPE
An Experiment on Dynamic NTK Scaling RoPE
☆64Updated last year
SalesforceAIResearch / GemFilter
☆78Updated 4 months ago
rayleizhu / vllm-ra
[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
☆39Updated last year
Tomorrowdawn / top_nsigma
The official code repo and data hub of top_nsigma sampling strategy for LLMs.
☆24Updated 3 months ago
BBuf / flash-rwkv
☆30Updated 11 months ago
thu-coai / MiniPLM
[ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models
☆45Updated 5 months ago
tridao / flash-attention-wheels
☆49Updated last year
li-plus / flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆41Updated 2 weeks ago
18907305772 / FuseAI
FuseAI Project
☆86Updated 3 months ago
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆59Updated 7 months ago
GAIR-NLP / Entropy-ABF
Official implementation for 'Extending LLMs’ Context Window with 100 Samples'
☆77Updated last year
thepowerfuldeez / OLMo
My fork os allen AI's OLMo for educational purposes.
☆30Updated 5 months ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆52Updated last year
DAMO-NLP-SG / CLEX
[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models
☆77Updated last year
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆41Updated last year
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆69Updated 11 months ago
sheryc / resonance_rope
[ACL 24 Findings] Implementation of Resonance RoPE and the PosGen synthetic dataset.
☆22Updated last year
thu-ml / low-bit-optimizers
Low-bit optimizers for PyTorch
☆128Updated last year
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆97Updated 7 months ago
kyleliang919 / Super_Muon
☆55Updated last month
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆55Updated 3 weeks ago
kaiokendev / cutoff-len-is-context-len
Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit
☆63Updated last year