Sanster / padding_free_llm_train
β14Updated last year
Alternatives and similar repositories for padding_free_llm_train:
Users that are interested in padding_free_llm_train are comparing it to the libraries listed below
- DPO, but faster πβ34Updated 2 months ago
- RWKV-7: Surpassing GPTβ79Updated 3 months ago
- β31Updated 8 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTOβ¦β53Updated last week
- Utilities for Training Very Large Modelsβ57Updated 4 months ago
- A specialized RWKV-7 model for Othello(a.k.a. Reversi) that predicts legal moves, evaluates positions, and performs in-context search. Itβ¦β38Updated 3 weeks ago
- An Experiment on Dynamic NTK Scaling RoPEβ62Updated last year
- A repository for research on medium sized language models.β76Updated 8 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];β36Updated last year
- Using FlexAttention to compute attention with different masking patternsβ40Updated 4 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and videoβ¦β30Updated 7 months ago
- Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"β17Updated this week
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understandingβ48Updated 2 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundryβ40Updated last year
- This repo is based on https://github.com/jiaweizzhao/GaLoreβ24Updated 5 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limitβ63Updated last year
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"β42Updated 3 months ago
- Implementation of the Mamba SSM with hf_integration.β56Updated 5 months ago
- LMTuner: Make the LLM Better for Everyoneβ33Updated last year
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuningβ33Updated last year
- β42Updated this week
- β12Updated last month
- A Framework for Decoupling and Assessing the Capabilities of VLMsβ40Updated 7 months ago
- Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)β38Updated last year
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"β96Updated 4 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.β57Updated 3 weeks ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Promptsβ38Updated 11 months ago
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719β22Updated 8 months ago